I am having a problem with scrapy-playwright. I am trying to take a screenshot of Google News for practice crawling.
In my case, I added URL parameters cd_max and cd_min to select a specific period, but it doesn’t seem to work as expected.
I suspect that my scrapy-playwright JavaScript rendering isn’t functioning properly. Here is the code I’m working with:
import scrapy
class ScreenshotSpider(scrapy.Spider):
name = "test"
start_urls = ["https://www.google.com/search?q=google&tbm=nws&lr=lang_en&start=0&tbs=cdr:1,cd_min:01/01/2023,cd_max:12/31/2023"]
custom_settings = {
"PLAYWRIGHT_BROWSER_TYPE": "firefox",
"PLAYWRIGHT_LAUNCH_OPTIONS": {
"headless": False,
},
"DOWNLOAD_HANDLERS": {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
},
}
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url,
meta=dict(
playwright=True,
playwright_include_page=True, # 페이지 객체를 포함
java_script_enabled=True,
),
callback=self.take_screenshot,
)
async def take_screenshot(self, response):
page = response.meta["playwright_page"]
await page.screenshot(path="screenasdasdsshot.png", full_page=True)
await page.close()
self.log(f"Screenshot saved for {response.url}")
I tried to take a screenshot with Selenium, and it worked perfectly!
However, I still can’t figure out why my scrapy-playwright setup isn’t working.
And here is the failure Result : enter image description here
And this image is what i expected :
enter image description here
New contributor