I’m trying to get a .csv file from this git post: link.
So far i tried screaping with:
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs4
url = 'https://github.com/hiring-lab/job_postings_tracker/blob/master/US/aggregate_job_postings_US.csv'
response = requests.request("GET", url)
soup = bs4(response.content, "html.parser")
But could not find the table inside the soup.
I also findout that i could simply use:
link = 'https://github.com/hiring-lab/job_postings_tracker/blob/master/US/aggregate_job_postings_US.csv'
dados = pd.read_csv(link, sep=',', index_col='date')
But also didn’t work. Output: ParserError: Error tokenizing data. C error: Expected 1 fields in line 40, saw 25
2
use Raw method
import pandas as pd
url = 'https://raw.githubusercontent.com/hiring-lab/job_postings_tracker/master/US/aggregate_job_postings_US.csv'
try:
data = pd.read_csv(url, sep=',', index_col='date')
print(data.head())
except Exception as e:
print(f"An error occurred: {e}")
Output
jobcountry ... variable
date ...
2020-02-01 US ... total postings
2020-02-02 US ... total postings
2020-02-03 US ... total postings
2020-02-04 US ... total postings
2020-02-05 US ... total postings
[5 rows x 4 columns]