Relative Content

Tag Archive for pythonscrapy

scrapy: return list of relative urls where a where a certain word appears

On this website , there are many cards which are accessible in multiple areas of the site. I am attempting to scrape the site to return all of its instances(in the form of its url) based on the title. For example, a card titled: AZ School Safety program, is found at: https://lawforkids.org/officers & https://lawforkids.org/educators . My goal is for scrapy to do this for me instead of manually finding every instance.

I want to scrape links of companies hidden under _doPostBack links

I have this website(https://www.nfrc.co.uk/search-members), I want to click on “Search for a roofing contractor now” then For a Domestic Property and then Search. A lot of pages appear, I just need links hidden under all the company names and navigate through all the pages. I have no coding background. It would be appreciated if someone provides me the python code to scrape these links.

Can’t login to linkedin with scrapy

I keep getting 303 redirected to some checkpoint page. I’ve checked the docs and tried to look for some similar problems but haven’t found much.

How can i set a global dict for spider in python scrapy

I am new to use python scrapy. I am now following some yt video and I stuck here, now I want to get data from a news website, I want to get some data from its content page (containing multiple news) and some more data from the news page. Since I have to yield data from different webpage and I can only use yield once, I planned to set a global dict variable “data” and I can add things within the dict. At the end I only need to yield the “data” once.