Relative Content

Tag Archive for javaapache-beam

Output PCollections to multiple files in Apache Beam

I need to create output to 4 different files in my Apache Beam pipeline. I’ll use a very simplified case to protect the underlying work, but the overall structure here needs to be preserved. What I would like to happen is to take an unbounded collection from a Kafka topic and publish it to 4 different files. The unbounded collection will be windowed by having a minimum gap duration between bursts of data by 30 minutes. For each windowed set of data, I’d like one file to contain all entries greater than 100, we’ll call that file greater.dat, one file to contain all entries less than or equal to 100, we’ll call that file less.dat and then a summary file that for each greater.summ and less.sum that contain the number of entries in each. For the following stream of data, this is what i’d expect

Output PCollections to multiple files in Apache Beam

I need to create output to 4 different files in my Apache Beam pipeline. I’ll use a very simplified case to protect the underlying work, but the overall structure here needs to be preserved. What I would like to happen is to take an unbounded collection from a Kafka topic and publish it to 4 different files. The unbounded collection will be windowed by having a minimum gap duration between bursts of data by 30 minutes. For each windowed set of data, I’d like one file to contain all entries greater than 100, we’ll call that file greater.dat, one file to contain all entries less than or equal to 100, we’ll call that file less.dat and then a summary file that for each greater.summ and less.sum that contain the number of entries in each. For the following stream of data, this is what i’d expect

How to combine the counts from two PCollection into one Object

I have an apache beam pipeline that reads a CSV file and also makes a query to a bigquery table, I need to count how many registers are in each PCollection and based on that create a FinalResult object that has both counts, have not been able to figure out how to combine both result into one object.