cloud dataflow environment variable for python file for flex template launch in docker file
In case of single pipeline setting python file as environment variable in docker file will be feasible.What if I want to deploy multiple pipelines
How to get data produced by worker_status.py on Dataflow?
I have a pipeline built with the Python SDK running on Dataflow using the Streaming Engine.
Multiple ValueState object; one is always null
I’m seeing odd behavior when using ValueState objects. I have a streaming dataflow job that processes records throughout the day and I window them emit a single file.
Very slow state retrieval and CoGroupByKey in streaming Dataflow
We have a Apache Beam pipeline developed with the Python SDK that does a stateful join of a large number of incoming Kafka records (approximately 2000 requests per second).
Very slow state retrieval and in CoGroupByKey in streaming Dataflow
We have a Apache Beam pipeline developed with the Python SDK that does a stateful join of a large number of incoming Kafka records (approximately 2000 requests per second).
Understanding very high StartBundle and FinishBundle wall times compared to ProcessElement
While investigating high system latency in a Python Apache Beam pipeline running on Dataflow, I noticed that for some steps the reported StartBundle and/or FinishBundle wall times can totally dominate the ProcessElement time.
Google Dataflow Apache beam version upgrade fails if we update and existing pipeline
I have a Google dataflow streaming pipeline running on Apache beam Java 2.50.0 . I wish to upgrade to 2.56.0 (current latest) via the update pipeline option. However , the update gives and error –
specifying the python version to use to run a dataflow pipeline
is it possble to force a dataflow job to run with a specific version of python?
I have some dependencies that are only supported from python 3.11
Is there a way to configure the billing project when Dataflow (python) runs a Bigquery job?
A dataflow job (python) is running a job that query data from Bigquery. The cost of the queries run in bigquery rely on the same project as the job. Is there a way to setup the billing project of the bigquery job?