Relative Content

Tag Archive for pythonpdfmemory-managementpython-multiprocessing

Python PDF searcher overflows the RAM

As part of my program, I’m trying to use the pdfminer third-party library in Python to open and read the PDF pages, and then use regular expressions to search for specific patterns. I’m also using multiprocessing to parallelize this, because I have a large number of PDFs to analyze. Each process should be handling a single PDF.