Relative Content

Tag Archive for apache-sparklazy-evaluationbloom-filter

BloomFilter mergeInPlace() producing unexpected behavior

The Spark Scala code snippet below reproduces the behavior I’m trying to understand. At a high level, we construct two tuples each containing a DF and a Bloom filter of the id column of the respective DF. Then we filter b such that any rows with IDs that are contained in a row in a are removed, and store the union of this filtered result and a as c.

BloomFilter mergeInPlace() producing unexpected behavior

The Spark Scala code snippet below reproduces the behavior I’m trying to understand. At a high level, we construct two tuples each containing a DF and a Bloom filter of the id column of the respective DF. Then we filter b such that any rows with IDs that are contained in a row in a are removed, and store the union of this filtered result and a as c.

BloomFilter mergeInPlace() producing unexpected behavior

The Spark Scala code snippet below reproduces the behavior I’m trying to understand. At a high level, we construct two tuples each containing a DF and a Bloom filter of the id column of the respective DF. Then we filter b such that any rows with IDs that are contained in a row in a are removed, and store the union of this filtered result and a as c.

BloomFilter mergeInPlace() producing unexpected behavior

The Spark Scala code snippet below reproduces the behavior I’m trying to understand. At a high level, we construct two tuples each containing a DF and a Bloom filter of the id column of the respective DF. Then we filter b such that any rows with IDs that are contained in a row in a are removed, and store the union of this filtered result and a as c.

BloomFilter mergeInPlace() producing unexpected behavior

The Spark Scala code snippet below reproduces the behavior I’m trying to understand. At a high level, we construct two tuples each containing a DF and a Bloom filter of the id column of the respective DF. Then we filter b such that any rows with IDs that are contained in a row in a are removed, and store the union of this filtered result and a as c.

BloomFilter mergeInPlace() producing unexpected behavior

The Spark Scala code snippet below reproduces the behavior I’m trying to understand. At a high level, we construct two tuples each containing a DF and a Bloom filter of the id column of the respective DF. Then we filter b such that any rows with IDs that are contained in a row in a are removed, and store the union of this filtered result and a as c.