Relative Content

Tag Archive for apache-spark

Optimal configuration for spark overhead and spark off heap [closed]

Closed 2 days ago.

Optimal configuration for spark overhead and spark off heap [closed]

Closed 2 days ago.

Cassandra write task failure not retried after upgraded to apache spark 3.4.x

We have been using spark with Cassandra. We upgraded from spark 3.2 to spark 3.4. But after the upgrade we noticed an annoying issue. Occasional write timeout failure in one writing task become fatal failures. Read failure are still handled well with retry. But write failures are no longer retried.

Cassandra write task failure not retried after upgraded to apache spark 3.4.x

what is the behaviour of the orderBy clause when used in a aggregate over a window function

I’m trying to run some aggregate some string fields in a spark dataframe using the collect_set() as aggregator over a Window but I’m failing to understand the behaviour when using the orderBy clause (the intent is to control the order of elemnts in the set):

CLUSTER BY is not available at Spark version3.5

CLUSTER BY is succesfully executed at Spark version3.4.

When running notebook, Spark versions 3.4 and 3.5 have different results about CLUSTER BY

CLUSTER BY is successfully executed on runtime spark version3.4, but it fails on spark version 3.5.

When running notebook, Spark versions 3.4 and 3.5 have different results

CLUSTER BY is successfully executed on runtime spark version3.4, but it fails on spark version 3.5.

New Dataframe columns from column of arrays

I have this Dataframe :

Spark : New Dataframe columns from column of arrays

I have this Dataframe :

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for apache-spark

Optimal configuration for spark overhead and spark off heap [closed]

Optimal configuration for spark overhead and spark off heap [closed]

Cassandra write task failure not retried after upgraded to apache spark 3.4.x

Cassandra write task failure not retried after upgraded to apache spark 3.4.x

what is the behaviour of the orderBy clause when used in a aggregate over a window function

CLUSTER BY is not available at Spark version3.5

When running notebook, Spark versions 3.4 and 3.5 have different results about CLUSTER BY

When running notebook, Spark versions 3.4 and 3.5 have different results

New Dataframe columns from column of arrays

Spark : New Dataframe columns from column of arrays