UserWarning while writing a large HDF file in Dask

  Kiến thức lập trình

I’m trying to write a large data (a tuple of dict; dict contains key and Dask Array) to disk as HDF using the function below.

def write_to_hdf5(file_path, data):
    with h5py.File(file_path, 'w') as f:
        for result in data:
            if result:
                for key, value in result.items():
                    # Create a dataset with optimal compression
                    dset = f.create_dataset(key, data=value, compression="gzip", compression_opts=9)

This results in the error UserWarning: Sending large graph of size 47.61 MiB. This may cause some slowdown. Consider scattering data ahead of time and using futures. as takes a long time to write to disk (~10GB). Is there any optimal way to write a HDF file with key value data?

From Dask website, da.to_hdf5('myfile.hdf5', '/x', x, compression='lzf', shuffle=True) writes an array to HDF; I’m not sure how to add label/key for each array so that I can visualize it in Panoply.