Question

I have a use case where to mark a create entity operation as successful in my service, I have to write some metadata to DataBase 1 (my service has direct access to DB), internally call another api of my current service which will persist some relations in DataBase 2 (A graphdb to which my service has direct access to) and then subsequently, mark entity state in DataBase 3 as ACTIVE, hence marking the create entity operation as successful. All the 3 databases are NoSQL data bases. I am looking for strategies to handle partial failure and rollback strategies. My clients can call the create entity operation api in Synchronous as well as asynchronous. My thought process around handling partial failure is after all the retries are exhausted, move the original create entity operation request to a DLQ, have alarms on the no of messages in this DLQ so that oncalls can be notified and look into the request for debugging purpose. If the root cause suggest transient issues with underlying systems, then oncalls can re-drive the original request to the source queue in order to retry the request in order to recover from partial failures otherwise further debugging and making system corrections & backfills would be the way to recover from partial failures

This seems fine to me for Asynchronous flow but for Synchronous calls, I am aware we should throw exception to the client and let them take the decision to retry the request or not but I wanted to understand how to handle the partial failure data that is persisted in the databases:

Should I implement custom rollback scenario on the fly before throwing exception to the client ? How do people implement custom rollbacks in such situations ? Any reference would help a lot ?
OR should I have a offline system/scheduled job which does the routine cleanup (like every 24 hours) of data due to partial failure in databases ?

Handling custom transaction failures

LEAVE A COMMENT Hủy