Relative Content

Tag Archive for pythonapache-sparkpyspark

How to get metadata of files present in a zip in PySpark?

I have a .zip file present on an ADLS path which contains multiple files of different formats. I want to get metadata of the files like file name, modification time present in it without unzipping it. I have a code which works for smaller zip but runs into memory issues for large zip files leading to job failures. Is there a way to handle this within PySpark itself?