Easiest way to batch process multiple PDF documents to combine them into one text document? (word preferably)

  Kiến thức lập trình

I will try to keep this short and sweet. I work for an organization that is effectively broke. I am limited to products that are in the Microsoft Office Suite (perhaps I could install power automate, if I can get that approved with my IT department, which I have already started the process).

I have approximately 30,000 pdf documents that, If I can combine them into one text document it would greatly expedite the process(to be specific, those 30k docs would be split into three different documents. One collection is approx 2k docs, another 8k, another 20k). I have about 5 months to essentially catalog all of them and create an index.

I know I can perform OCR on pdf’s individually, and I could theoretically combine pdf’s together, and then OCR them together. I’m just wanting to try and expedite the process as much as possible, as that would be tedious and time consuming and I need to focus on compiling the index from the content of all of these documents. I’m really in a bit of a bind here and would greatly appreciate any assistance, thank you all in advance for any help!

LEAVE A COMMENT