Data Factory Parquet Incorrectly Ingestion Decimals

  Kiến thức lập trình

I am working on an Azure Data Factory pipeline and noticed that when I use a parquet sink to ADLS Gen 2, certain decimals are becoming truncated, and are returning results not consistent with the data source. From ADLS this data is being ingested into databricks for analytics, which was where the bug was initially noted.

Example:
Original Data Source: 861.099901397946075
In Parquet: 86199901397946075
In DataBricks: 86.199901397946075

The datatype in the datasource is Decimal(35,15), and when saving the data as a parquet, it appears to remove the leading “0” in the decimal, causing the decimal portion of the number to be offset.

I have also noticed that this does not occur with every decimal entry I am ingesting, only the ones with leading zeros in that decimal spot.

Has anyone experienced this, and know of a fix? Thanks.

Tried – loading the data in via Parquet. Loading that data into to DataBricks. Expected consistent results with the data source. I have tried Parquet with no/different kinds of compression with no success.

I have used a CSV sink instead of parquet, and the data did populate correctly. I prefer parquet for my use case though.

New contributor

Brandon Chan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

Data Factory Parquet Incorrectly Ingestion Decimals

I am working on an Azure Data Factory pipeline and noticed that when I use a parquet sink to ADLS Gen 2, certain decimals are becoming truncated, and are returning results not consistent with the data source. From ADLS this data is being ingested into databricks for analytics, which was where the bug was initially noted.

Example:
Original Data Source: 861.099901397946075
In Parquet: 86199901397946075
In DataBricks: 86.199901397946075

The datatype in the datasource is Decimal(35,15), and when saving the data as a parquet, it appears to remove the leading “0” in the decimal, causing the decimal portion of the number to be offset.

I have also noticed that this does not occur with every decimal entry I am ingesting, only the ones with leading zeros in that decimal spot.

Has anyone experienced this, and know of a fix? Thanks.

Tried – loading the data in via Parquet. Loading that data into to DataBricks. Expected consistent results with the data source. I have tried Parquet with no/different kinds of compression with no success.

I have used a CSV sink instead of parquet, and the data did populate correctly. I prefer parquet for my use case though.

New contributor

Brandon Chan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT