Trying to read from JDBC with pyspark. In JDBC there is a column FEC_PART as a date type with the format yyyymmdd. For reading the parameter upperBound or lowerBound doesn’t match the desire format yyyymmdd:
- With the format yyyymmdd, the desire one, got the error that can’t identify the date format:
pyspark.sql.utils.IllegalArgumentException: Cannot parse the bound value 20200112 as date
- With the format yyyy-mm-dd, it doesn’t match the format of the FEC_PART in JDBC. It shows this error:
java.sql.SQLDataException: ORA-01861: literal does not match format string
and
WHERE "ARQPIB_FEC_PART" < '2020-01-13' or "ARQPIB_FEC_PART" is null , Error Msg = ORA-01861: literal does not match format string
The error appear when read.load() with spark.
input_data = spark.read
.format(constants.FORMAT_JDBC)
.options(**properties)
.option("partitionColumn", "FEC_PART") # Keep partition column as it is
.option("lowerBound", "20200112") # Use the yyyymmdd format for bounds to match partitionColumn
.option("upperBound", "20200114") # Use the yyyymmdd format for bounds to match partitionColumn
.option("numPartitions", "2")
.load()
1st approach. Trying adding this options:
.option("oracle.jdbc.mapDateToTimestamp", "false")
.option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYYMMDD'")
2nd approach with this other option:
.option("dateFormat", "yyyyMMdd")
And some more approach without any result.
It works for me with a very similar to your approach 1), but changing the date format, from YYYYMMDD to: YYYY-MM-DD, so:
.option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'")
And then passing the literals as:
--jdbc.bigquery.input.lowerbound="2024-03-15"
--jdbc.bigquery.input.upperbound="2024-03-19"
1