Issue with JS Rendering in scrapy-playwright While Taking Google News Screenshots

  Kiến thức lập trình

I am having a problem with scrapy-playwright. I am trying to take a screenshot of Google News for practice crawling.

In my case, I added URL parameters cd_max and cd_min to select a specific period, but it doesn’t seem to work as expected.

I suspect that my scrapy-playwright JavaScript rendering isn’t functioning properly. Here is the code I’m working with:

import scrapy


class ScreenshotSpider(scrapy.Spider):
    name = "test"
    start_urls = ["https://www.google.com/search?q=google&tbm=nws&lr=lang_en&start=0&tbs=cdr:1,cd_min:01/01/2023,cd_max:12/31/2023"]
    custom_settings = {
        "PLAYWRIGHT_BROWSER_TYPE": "firefox",
        "PLAYWRIGHT_LAUNCH_OPTIONS": {
            "headless": False,
        },
        "DOWNLOAD_HANDLERS": {
            "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
            "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        },
    }

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url,
                meta=dict(
                    playwright=True,
                    playwright_include_page=True,  # 페이지 객체를 포함
                    java_script_enabled=True,
                ),
                callback=self.take_screenshot,
            )

    async def take_screenshot(self, response):
        page = response.meta["playwright_page"]
        await page.screenshot(path="screenasdasdsshot.png", full_page=True)
        await page.close()
        self.log(f"Screenshot saved for {response.url}")

I tried to take a screenshot with Selenium, and it worked perfectly!

However, I still can’t figure out why my scrapy-playwright setup isn’t working.

And here is the failure Result : enter image description here

And this image is what i expected :

enter image description here

New contributor

문창식 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT