Relative Content

Tag Archive for apache-sparkpysparkapache-spark-sqldelta-lake

Spark streaming is broadcasting the larger target dataset causing memory issues in executor

I am using delta merge operation to merge the incoming data to existing deltalake table. i am getting ERROR : org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8.0 GiB: 8.1 GiB. error in my streaming job. I have not set spark.sql.autoBroadcastJoinThreshold property so ideally the default value 10 MB should be set. But it is not getting applied.
Also in Spark UI in merge command i am able to see it is trying to broadcast the target bigger dataset , whereas ideally it has to broadcast the smaller incoming dataset.
can someone help me in understanding this behaviour.
[enter image description here](https://i.sstatic.net/XIMr2Zoc.png)

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for apache-sparkpysparkapache-spark-sqldelta-lake

Spark streaming is broadcasting the larger target dataset causing memory issues in executor