Relative Content

Tag Archive for sqldataframescalaapache-sparkpartitioning

partitionBy(“INSERT_DATE”) i’m doing this but it is not overwriting data without creating partition

partitionBy(“INSERT_DATE”) I’m doing this but it is not overwriting data without creating partition // Enable dynamic partitioning spark.conf.set(“hive.exec.dynamic.partition”, “true”) spark.conf.set(“hive.exec.dynamic.partition.mode”, “nonstrict”) Method 1 // Load into History table spark.conf.set(“hive.exec.dynamic.partition.mode”, “nonstrict”) latestRecordDf.write.mode(“overwrite”).format(“parquet”).option (“path”,”/edx/us/lowes/pro/data/pro_metrics/pro_acctsales_hist”).saveAsTable(“pro_metrics.pro_acctsales_hist”) // Load data in main table combinedDf.write.mode(“overwrite”).format(“parquet”).option (“path”,”/edx/us/lowes/pro/data/pro_metrics/pro_acctsales”).saveAsTable(“pro_metrics.pro_acctsales”) method 2 // Insert overwrite only the relevant partition latestRecordDf.write.mode(“overwrite”).format(“parquet”).partitionBy(“INSERT_DATE”) .option(“path”, “/edx/us/lowes/pro/data/pro_metrics/pro_acctsales_hist”) .saveAsTable(“pro_metrics.pro_acctsales_hist”) // Insert […]