Scaling Heavy Writes When Fetching Data From External API

  softwareengineering

I have a service that fetches financial data from an external APIs and parses and saves that data in a postgres DB. The data can be quite large in size, our containers have crashed quite a lot due to the sheer size of the data. For example if we have N merchants and each merchant is onboarded on M platforms we have to Fetch the data N*M times. We have a job that runs once a day for each merchant and each provider and calls the third party API to fetch the data.

Currently it’s a simple service that fetches the data in bulk and writes the data to the db after fetching it, and since it’s financial data consistency is required (though we can deal with eventual consistency in this case). And since we are dealing with archaic third party APIs the APIs are not always paginated so we have to fetch the data in bulk.

How can I scale it up to avoid crashing the service and to minimize the database load while maintaining the consistency requirements. Currently the data fetching takes hours to complete as we have to time the jobs to not overload the database, how can I optimize times as well?

I was thinking of using something like Kafka to control the DB writes and to break down the writes into chunks but maintaining consistency doing this is a lot of work.

Any suggestions as welcome. Thanks 🙂

New contributor

Sachin Saini is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

LEAVE A COMMENT