I’m considering using Firebase and its Cloud Firestore database. I have data in AWS Delta Lake on S3, of which a subset of the data needs to be exported into Cloud Firestore on a daily basis. The data may need to be converted into a form appropriate for Cloud Firestore – e.g. from a single Delta Lake table to one or more Firestore collections/subcollections. Moreover, the data will likely work best with upserts – as there will be some data that will either need to be updated or inserted. My estimate is approximately 100,000 records from my Delta Lake table will be exported to Cloud Firestore on a daily basis. The data in the Cloud Firestore will be read-only once loaded.
What is the most cost efficient approach for doing this on a daily basis? I’m not too concerned about the performance as long as the data are imported into Cloud Firestore in a few minutes. Some ideas include:
-
Copying AWS S3 Delta Lake data to BigLake? I don’t want to manage another Delta Lake though.
-
Creating a PySpark script to invoke the Cloud Firestore API to insert/update data.
Are there any other options?