How to extract all the data from the filtered page?

  Kiến thức lập trình

I’m currently trying to extract a table from a website. I filtered the table that meets my requirement (the date, zones and etc..). So after i filter the table, I want to scrape the table but all I get is the original table when the page was loaded in. The URL is no help at all. This is the original url ‘https://www.advenueplatform.com/smartpublisher/report/’ and this is url after I filtered the table ‘https://www.advenueplatform.com/smartpublisher/report/#’. There is no parameters but only hashtag. WHY DIDNT THEY PUT PARAMETERS ON THEIR URL ?!?!??!?!. I try to extract the data using xhr but the url is just this ‘https://www.advenueplatform.com/smartpublisher/report/ajax-generate-report/’, AGAIN NO PARAMETERS TO WORK WITH?!??!!?!?!?.

Is there any ways to get the data after I filtered the page? and im using playwrigth and python to automate and web scrape it.

`
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup

with sync_playwright() as p:

browser = p.chromium.launch(headless=False, slow_mo=50)
page = browser.new_page()
page.goto('https://www.advenueplatform.com/auth/login') # 1st url
page.fill('input[name=username]', username)
page.fill('input[name=password]', password)
page.click('button[type=submit]')
page.goto('https://www.advenueplatform.com/smartpublisher/report/') # 2nd url (this is the page with the original table)
page.click('span[id=span_ReportZone]')
page.click('input[id=rad_Zone_2]')

# Mingguan Wanita
page.check("#chk_Site_12583")
page.click('button[id=btn_DisplayReport]') # This is when I hit apply to update the page with the new filtered table)

page.wait_for_load_state('networkidle') # I did use this but it return the correct table on the first try and after that in return the original value from the original table



html = page.inner_html('#div_Table > section > table > tbody') # I assumed this line uses the 2nd url
soup = BeautifulSoup(html, 'html.parser')

print(soup) # this printed out the original table

`

I did try the page.wait_for_load_state(‘networkidle’) but it only it return the correct table on the first try and after that in return the original value from the original table

New contributor

Zulkifli Arshad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

LEAVE A COMMENT