Relative Content

Tag Archive for machine-learningdeep-learninghuggingface-transformershuggingfacehuggingface-datasets

How to apply .map() function and keep it as an iterator for a Hugging Face Dataset, in Streaming Mode without loading it to memory?

I’m currently working with the Hugging Face datasets library and need to apply transformations to multiple datasets (such as ds_khan and ds_mathematica) using the .map() function, but in a way that mimics streaming (i.e., without loading the entire dataset into memory). I am particularly interested in interleaving these transformed datasets while keeping the data processing as lazy as possible, similar to streaming=True.