Relative Content

Tag Archive for pythonpandasamazon-s3aws-data-wrangler

Python awswrangler performance under large number of partitions

I need to store/fetch data using two hierarchy levels, date and class. So when I upload data to S3 as part of the ETL pipeline, I’m using awswrangler‘s to_parquet function with partition_cols=["date", "class"]. To fetch data from the S3 bucket, I’m using the read_parquet function with partition_filter=filter_func, where is similar to filter_func=lambda x: x["date"] in date_list and x["class"] in class_list.