How to set upperBound and lowerBound format for reading JDBC with pyspark

  Kiến thức lập trình

Trying to read from JDBC with pyspark. In JDBC there is a column FEC_PART as a date type with the format yyyymmdd. For reading the parameter upperBound or lowerBound doesn’t match the desire format yyyymmdd:

  • With the format yyyymmdd, the desire one, got the error that can’t identify the date format:
    pyspark.sql.utils.IllegalArgumentException: Cannot parse the bound value 20200112 as date
  • With the format yyyy-mm-dd, it doesn’t match the format of the FEC_PART in JDBC. It shows this error:
    java.sql.SQLDataException: ORA-01861: literal does not match format string and
    WHERE "ARQPIB_FEC_PART" < '2020-01-13' or "ARQPIB_FEC_PART" is null , Error Msg = ORA-01861: literal does not match format string

The error appear when read.load() with spark.

input_data = spark.read 
    .format(constants.FORMAT_JDBC) 
    .options(**properties) 
    .option("partitionColumn", "FEC_PART")  # Keep partition column as it is
    .option("lowerBound", "20200112")  # Use the yyyymmdd format for bounds to match partitionColumn
    .option("upperBound", "20200114")  # Use the yyyymmdd format for bounds to match partitionColumn
    .option("numPartitions", "2") 
    .load()

1st approach. Trying adding this options:

.option("oracle.jdbc.mapDateToTimestamp", "false")
.option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYYMMDD'")

2nd approach with this other option:

.option("dateFormat", "yyyyMMdd")

And some more approach without any result.

It works for me with a very similar to your approach 1), but changing the date format, from YYYYMMDD to: YYYY-MM-DD, so:

.option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'")

And then passing the literals as:

--jdbc.bigquery.input.lowerbound="2024-03-15"
--jdbc.bigquery.input.upperbound="2024-03-19"

1

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website Kho Theme wordpress Kho Theme WP Theme WP

LEAVE A COMMENT