I am currently making a newsletter website that will send you a email with the top ten manga for that week. Currently I have a web scraper that retrieves the top ten along with descriptions and other fun stuff. It then puts that into a pandas table which I was hoping to use along with a Heroku hosted Postgres database. This being my first project, the goal is to learn a lot of different elements such as front dev, back dev, and databases. I’m not too sure if the way I’m structuring this program makes any sense or if I’m adding too many unnecessary things.
My plan is:
- Have a website where they can put in their email to join the newsletter or look at a table that has the top ten for that week.
- The website will interact with the database to pull info for the table/store emails.
- It will run a script every week that populates the database, sends a email to whoever signed up, and updates the website table/chart.
Does this sound right to you? How would you go about it? Any tips/advice?
This sounds like a great first project – lots of variety, and focused around a subject that interests you.
A few things I’d recommend:
Break down the requirements
Since your project involves so many different types of component (scraper, database, UI, etc.) as well as development specialties (frontend vs backend), I would break the project’s requirements down into individual projects that you can build and test separately, then recombine them and/or build each piece as a package/library or microservice so they can support each other while being reusable.
Try to make the spec/requirements for each of those components clear and specific, e.g.
- define exactly what the scraper will need to do, and code until it meets all those requirements.
- define exactly how you want the signup form to work, then build and test that until it meets all those requirements.
Bonus points if you use Test Driven Development, since you’ll be building software in a way that encourages quality and standards that are followed by a lot of the industry. But the important thing for a first project is to love it and stick with it.
Research some options for resources
You mentioned Python and PostgreSQL in your tags. Python is a great choice for web scrapers, and for personal projects in general. Since you’re building a web application, I’d recommend spending a little time searching on YouTube and/or Google to find a Python framework that will work well with PostgreSQL and provide a nice web UI without much fuss.
As you progress, you can keep checking https://pypi.org/ and similar resources to see if there’s an available solution for the problem you’re trying to solve or, since this is a personal project, try to code your own solution. Coding your own will increase your skill with the language faster than using a package, while using a package saves time and lets you focus on your primary mission.
I use Flask if the Python web app is going to be simple, plain, and fast: https://flask.palletsprojects.com/en/2.2.x/ but it sounds like Django might be more what you need: https://www.djangoproject.com/ since your goal is to have the app available to the public, so you want it to look good and have a bit more functionality covered by the framework.
First make it work, then make it pretty
Sounds like you have a pretty good vision for how you want this web app to work and what you want it to do. I imagine you can see what you want the UI and styling to look like as well, but remember that it’s fine to make it completely functional first and then add/edit the style and layout afterward. It’s not like making a physical piece of art where the visual aspects have to be built in from the start. As software developers, we get to build something, put it out there in the world, and change how it looks later if we want.
Not that you shouldn’t focus on the visual/UI aspects as well! There’s just no need to do it all at once. You can take time to work on the visuals separately from the core functionality.
- Be clear in the project specifications what you mean by “top 10”. Do you mean most popular? Most purchased? That will help you narrow down exactly from where to get the data, and exactly what you want to get.
- Any time you’re adding scraping to a web app, think about whether the scraping you’re doing falls within reasonable public use of the scraped resource. If it’s a public web page and you’re pulling non-copyrighted content for fair use? Great. If you’re running the scraper hundreds of times a minute and republishing someone else’s work without asking? Then it starts to get iffy.
Good luck 🙂