I need to unzip a multi part zip file using python function ensuring it is cross pltform compatible (Linux, Windows and MacOS).
The folder structure is the following:
project-root/
├── data.z01
├── ...
├── data.z22
└── data.zip
I triesd the following Python code:
def _combine_split_zip_files(split_files, combined_file):
with open(combined_file, 'wb') as wfd:
for f in tqdm(split_files, file=TqdmLogger(logger)):
with open(f, 'rb') as fd:
shutil.copyfileobj(fd, wfd)
def _unzip_protected_zip(zip_file, extract_to, password):
with zipfile.ZipFile(zip_file, 'r') as zf:
zf.extractall(path=extract_to, pwd=password.encode())
split_files = [root / f"data.z{i:02d}" for i in range(1, 23)] + [root / "data.zip"]
combined_file = root / "combined.zip"
_combine_split_zip_files(split_files, combined_file)
_unzip_protected_zip(combined_file, root / "data", ZIP_PW)
The __combine_split_zip_files()
seams to work since it generate a combined.zip files that could be extracted by the Linux zip extractor. But unfortunately zipfile
does not recognize the combined file and return the following error: zipfile.BadZipFile: zipfiles that span multiple disks are not supported
.
Just for testing I’ve tried to combine the multiple zip part using the following bash command
zip -F data.zip --out combined.zip
Combining the multiplle part zip file using this method lead to a combined.zip
file that is correctly recognize also by zipfile
.
There is a way to merge the multiple part of the zip in a correcct way using just Python code?
0
You can read the files into memory as a io.BytesIO
object, which can then be read directly into a ZipFile
object for extraction.
from pathlib import Path
from io import BytesIO
from zipfile import ZipFile
def multipart_file(files: list[str]) -> BytesIO:
fp = BytesIO()
for f in files:
fp.write(Path(f).read_bytes())
fp.seek(0)
return fp
def unzip_multipart(files: list[str], extract_to: str, password: str):
mpf = multipart_file(files)
with ZipFile(mpf) as zf:
zf.extractall(path=extract_to, pwd=password.encode())
For the files you listed, typically the multipart zip files do not include the data.zip
file. So you would run this using:
unzip_multipart([root / f"data.z{i:02d}" for i in range(1, 23)])
1