Extremely slow MERGE INTO statements
We have an Apache Iceberg data lake. We are using structured streaming and getting batches of approximately 10 records. When we merge this data frame into a table of approximately 600 records, we are seeing approximately 2 minutes of delay wall clock time. It is an EMR cluster and not under load.
Create Iceberg Partition based on Integer value that stores timestamp in millisecond
I want to create a table partitioned by date based on an integer timestamp value:
Iceberg write fails when writing more than 1 file per partition
I’m having an issue writing more than 1 file per partition to Iceberg.
Why is Spark SQL running extremely slow?
I’m using Spark SQL to perform a simple query from my Iceberg table. Some info about the table itself because that might be useful (state from the moment of posting this question):
java.lang.IllegalArgumentException: Cannot initialize HadoopCatalog because warehousePath must not be null or empty
I want to write iceberg format with directory based catalog in hadoop.