Change the column name from Upper Case to LowerCase parquet file in pyspark

  Kiến thức lập trình

I want to convert a column name in the parquet file from Uppercase to Lowercase and rewrite it back at the same location (From “EXE_TS” to “exe_ts”). The parquet file is partioned with a column named as data_as_of_date.

Tried the below but its not working.It’s adding a column index_level_0.

from pyspark.sql import SparkSession

Create a SparkSession

spark = SparkSession.builder.getOrCreate()

Read the partitioned Parquet file into a DataFrame

df = spark.read.option(“basePath”, “/mnt/dataops/batchlogs/”).parquet(“/mnt/dataops/batchlogs”)

Rename the column

df = df.withColumnRenamed(“EXE_TS”, “exe_ts”)

Reset the index

df = df.toPandas().reset_index(drop=True)

Convert the Pandas DataFrame back to a Spark DataFrame, dropping the index column

df = spark.createDataFrame(df).drop(“index_level_0“)

Write the modified DataFrame back to the Parquet file, preserving the partitioning

df.write.mode(“overwrite”).partitionBy(“data_as_of_date”).parquet(“/mnt/dataops/batchlogs”)

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT