Relative Content

Tag Archive for apache-sparkpysparkapache-spark-sql

Optimizing a complex pyspark join

I have a complex join that I’m trying to optimize
df1 has cols id,main_key,col1,col1_isnull,col2,col2_isnull…col30
df2 has cols id,main_key,col1,col2..col_30

Spark job spilling data vs OOM

I am using spark sql to run sql jobs using 10G executor memory.
When I am monitoring using Spark UI, I can see that data is being spilled to Disk and Memory (expected doing some explode operations ).

Spark job spilling data vs OOM

I am using spark sql to run sql jobs using 10G executor memory.
When I am monitoring using Spark UI, I can see that data is being spilled to Disk and Memory (expected doing some explode operations ).

Spark job spilling data vs OOM

I am using spark sql to run sql jobs using 10G executor memory.
When I am monitoring using Spark UI, I can see that data is being spilled to Disk and Memory (expected doing some explode operations ).