Optimal configuration for spark overhead and spark off heap [closed]
Closed 2 days ago.
Optimal configuration for spark overhead and spark off heap [closed]
Closed 2 days ago.
Cassandra write task failure not retried after upgraded to apache spark 3.4.x
We have been using spark with Cassandra. We upgraded from spark 3.2 to spark 3.4. But after the upgrade we noticed an annoying issue. Occasional write timeout failure in one writing task become fatal failures. Read failure are still handled well with retry. But write failures are no longer retried.
Cassandra write task failure not retried after upgraded to apache spark 3.4.x
We have been using spark with Cassandra. We upgraded from spark 3.2 to spark 3.4. But after the upgrade we noticed an annoying issue. Occasional write timeout failure in one writing task become fatal failures. Read failure are still handled well with retry. But write failures are no longer retried.
what is the behaviour of the orderBy clause when used in a aggregate over a window function
I’m trying to run some aggregate some string fields in a spark dataframe using the collect_set() as aggregator over a Window but I’m failing to understand the behaviour when using the orderBy clause (the intent is to control the order of elemnts in the set):
CLUSTER BY is not available at Spark version3.5
CLUSTER BY is succesfully executed at Spark version3.4.
When running notebook, Spark versions 3.4 and 3.5 have different results about CLUSTER BY
CLUSTER BY is successfully executed on runtime spark version3.4, but it fails on spark version 3.5.
When running notebook, Spark versions 3.4 and 3.5 have different results
CLUSTER BY is successfully executed on runtime spark version3.4, but it fails on spark version 3.5.
New Dataframe columns from column of arrays
I have this Dataframe :
Spark : New Dataframe columns from column of arrays
I have this Dataframe :