I want to convert a column name in the parquet file from Uppercase to Lowercase and rewrite it back at the same location (From “EXE_TS” to “exe_ts”). The parquet file is partioned with a column named as data_as_of_date.
Tried the below but its not working.It’s adding a column index_level_0.
from pyspark.sql import SparkSession
Create a SparkSession
spark = SparkSession.builder.getOrCreate()
Read the partitioned Parquet file into a DataFrame
df = spark.read.option(“basePath”, “/mnt/dataops/batchlogs/”).parquet(“/mnt/dataops/batchlogs”)
Rename the column
df = df.withColumnRenamed(“EXE_TS”, “exe_ts”)
Reset the index
df = df.toPandas().reset_index(drop=True)
Convert the Pandas DataFrame back to a Spark DataFrame, dropping the index column
df = spark.createDataFrame(df).drop(“index_level_0“)
Write the modified DataFrame back to the Parquet file, preserving the partitioning
df.write.mode(“overwrite”).partitionBy(“data_as_of_date”).parquet(“/mnt/dataops/batchlogs”)