Reading Parquets From S3 With Apache Spark Slows Down At Later Stages
I have millions of parquets files on s3 with directory structure as code/day=xx/hour=/*.parquets.
At max under hour folder we have 2000 parquest file with average size of 100kb.
I have millions of parquets files on s3 with directory structure as code/day=xx/hour=/*.parquets.
At max under hour folder we have 2000 parquest file with average size of 100kb.