Relative Content

Tag Archive for pythondataframepysparkparquet

PySpark dataframes not matching the headers

I have a bunch of parquet files written over a period of 6 months, partitioned by the date and hour when they were created. Over these 6 months, the headers have changed, so the data schema for the parquet files created on Jan 1 is different from the files created on May 1.