Relative Content

Tag Archive for apache-sparkpysparkazure-storageazure-synapse

Spark fails with error: Line Separator not in initial block of partition

I am running Spark jobs on Azure Synapse Analytics. The notebook reads and writes data from Azure Data Lake Storage Gen 2 account (same storage, however, read and write happens at different paths). It processes CSV data (large chunk) and small reference data (parquet/CSV) and writes final output in parquet format. The larger dataset of CSV format is stored as 200 partition files.

Spark fails with error: Line Separator not in initial block of partition

I am running Spark jobs on Azure Synapse Analytics. The notebook reads and writes data from Azure Data Lake Storage Gen 2 account (same storage, however, read and write happens at different paths). It processes CSV data (large chunk) and small reference data (parquet/CSV) and writes final output in parquet format. The larger dataset of CSV format is stored as 200 partition files.