Relative Content

Tag Archive for apache-sparkpysparkapache-iceberg

Spark OverwritePartitions always triggering shuffle of data

I have a spark job that is reading from an oracle data source and writing to a iceberg table.
There are multiple queries executing in multi-threading, each query hits just one partition (in iceberg).
Code too insert looks like:
df.writeTo(“catalog.db.iceberg_table”).OverwritePartitions().