Relative Content

Tag Archive for pythonpython-3.xweb-scraping

Extracting specific keyword from URLs of the .htm or .txt with Python

I have a dataset of sec edgar DEF 14A reports (1994-2022) which includes company’s symbol (Like AAPL for Apple Inc.), date the report is published (usually one a year), and the url of the report. I want my python to extract the Chief executive officer’s information from every report_url which is in the form of textual form, here’s the simple example of the text when I searched for cheif executive officer:
Dr. Scangos has served as Chief Executive Officer and a director of Vir Biotechnology, Inc. since January 2017. From July 2010 to January 2017, Dr. Scangos served as the Chief Executive Officer and a director of Biogen Inc., a biopharmaceutical company. From 1996 to July 2010, Dr. Scangos served as the President and Chief Executive Officer of Exelixis, Inc.

And so much more patterns in which they are written.

Unable to download an image using the tls_client library

I’m trying to download an image using the tls_client library, but the script I’ve created seems to have downloaded something that I can’t open. Just to inform you, I can download the image using the requests library without any issues.