Problem with unzipping multi part zip files using Python

  Kiến thức lập trình

I need to unzip a multi part zip file using python function ensuring it is cross pltform compatible (Linux, Windows and MacOS).

The folder structure is the following:

project-root/
├── data.z01
├── ...
├── data.z22
└── data.zip

I triesd the following Python code:

def _combine_split_zip_files(split_files, combined_file):
    with open(combined_file, 'wb') as wfd:
        for f in tqdm(split_files, file=TqdmLogger(logger)):
            with open(f, 'rb') as fd:
                shutil.copyfileobj(fd, wfd)

def _unzip_protected_zip(zip_file, extract_to, password):
    with zipfile.ZipFile(zip_file, 'r') as zf:
    zf.extractall(path=extract_to, pwd=password.encode())

split_files = [root / f"data.z{i:02d}" for i in range(1, 23)] + [root / "data.zip"]
combined_file = root / "combined.zip"

_combine_split_zip_files(split_files, combined_file)
_unzip_protected_zip(combined_file, root / "data", ZIP_PW)

The __combine_split_zip_files() seams to work since it generate a combined.zip files that could be extracted by the Linux zip extractor. But unfortunately zipfile does not recognize the combined file and return the following error: zipfile.BadZipFile: zipfiles that span multiple disks are not supported.

Just for testing I’ve tried to combine the multiple zip part using the following bash command

zip -F data.zip --out combined.zip

Combining the multiplle part zip file using this method lead to a combined.zip file that is correctly recognize also by zipfile.

There is a way to merge the multiple part of the zip in a correcct way using just Python code?

0

You can read the files into memory as a io.BytesIO object, which can then be read directly into a ZipFile object for extraction.

from pathlib import Path
from io import BytesIO
from zipfile import ZipFile

def multipart_file(files: list[str]) -> BytesIO:
    fp = BytesIO()
    for f in files:
        fp.write(Path(f).read_bytes())
    fp.seek(0)
    return fp

def unzip_multipart(files: list[str], extract_to: str, password: str):
    mpf = multipart_file(files)
    with ZipFile(mpf) as zf:
        zf.extractall(path=extract_to, pwd=password.encode())

For the files you listed, typically the multipart zip files do not include the data.zip file. So you would run this using:

unzip_multipart([root / f"data.z{i:02d}" for i in range(1, 23)])

1

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT