Since March I switched jobs and became the Tech Lead of two teams that work on a shared codebase and a separate codebase for “region” specific implementations.

This team works using NestJS, which encourages a sort of Onion Architecture / Hexagonal Architecture + DDD. At first, I saw this as an overengineered approach, but I preferred to keep silent because having never worked like that demanded an open-mind.

Until I started to monitor problems and the microservices architecture was implemented as a fully synchronous flow, with no resilency nor decoupled mechanisms implemented. If you had a flow that required doing different microservices calls and one of them failed, there was no fallback approach.

I encouraged the team to move to an events / async architecture and our application started to be more resilient.

Until I found a new problem where we have the domain and we want to persist it in the database.

When saving a domain object, we serialize the whole object as a SQL instruction (using TypeORM repositories) where we unnecessarily save the whole object rather than its “diff”.

This caused a lot of issues into the async flow because we are now facing concurrent writes due to how the persistence layer is implemented.

And here is where I am confused on what to do, since we dont use the ORM’s entity to assign changes and then save them I cannot come up nor understand how to fix this. In some scenarios, one of the quick fixes I came up was:

  • Receive a domain object to save
  • Load the record from the database
  • Reassign the record’s attributes from the domain object
  • Only the diff is stored

But, for me, this seems wrong. Not only from a performance perspective but also that this can drive unintentional concurrent writes still. In some scenarios where this kept happening we started using an Optimistic Versioning approach so the flow is cancelled and retried later if in the database there was saved a newer version.

What I am thinking right now is to remove part of the domain layer and have the entities from the ORM mixed with the domain, so we can keep track of the changes made using the ORM but this seems to violate the reasoning of non-coupled layers.

The team also understand the source of bugs, but they also keep saying “oh our previous tech lead said the orm takes care of that” and then in a flow where we have a:

  • account object
  • deposits from an account object
  • withdraws from an account object

We run into the crazyness of updating a single account attributes updates the whole account record and its associated collection just for the lulz?


Edit: Added a more specific use case

Let me explain the scenario:

  • We receive a withdrawal instruction
  • Account with its withdrawals is loaded from the database
  • The new withdraw is added to the Account
  • We call save account
  • The repository of Account does “AccountTable.save(account)” which ends up updating the whole account + its already loaded withdrawals + the new withdrawal

So, here my question is:

  • why we update the whole withdrawals collection? Doesnt make sense
  • the same for the account, no change was done to the row, yet we are doing an update of the whole row when another flow could be modifying the account row and have its updated value overwritten

In my mind, I would just do something like WithdrawalRepository.save(newWithdrawal).

In this scenario, its clearly notorious what has to be written in the database, but I have more complex ones where that is not easily understandable.

For example, we have Charges and Transactions. Transactions receive Events. These Events are processed through a state machine, so the Charge, Transaction and TransactionEvents have been modified.

Rather than attempting to save what changed, we save everything, and that has caused us issues where for example we have an async flow that calculates the fees for a Charge at the same time an event is being processed. It can happen that the event processing ends overwriting the fees as NULL value because at the moment of processing the event it had no fees. We solved this with optimistic concurrency, but I am not sure if I am hacking around the architecture or whatsoever, i mostly worked with MVC and ActiveRecord and rarely ran into these issues.

I think this is going to be hard to answer without knowing your full solution, which is prob going to be way to long to explain. But.

When saving a domain object, we serialize the whole object as a SQL instruction (using TypeORM repositories) where we unnecessarily save the whole object rather than its “diff”.

This rings alarm bells for me. I think its super common to save/overwrite the whole object and using a diff is fraught with problems.

I’m trying to imagine how you get into difficulty with the “save the whole thing” approach and in my mind i see this:

  1. You have switched to events
  2. But you only send commands and ids
  3. Each command processor loads the object from the db, changes it, saves it.
  4. Because you have come from a sync setup, your events all fire at once.
  5. This means you have multiple processes operating on the same object at the same time.

Generally I would advise a combination of three solutions.

One: Have events fire events

Instead of firing all the events from the initial command, have it trigger just the first event, then have that event trigger the next and so on.

This forces each event in the group to fire in turn rather than all at once. You can still branch out where you know there will be no conflicts, but you can also keep it synchronous where you know it would require a lock in any case.

Two: Big Messages. Send the whole object not just the ID

This removes the DB lookup. You send all the info required for a command, and then process that command. Your not passing information via a save and load from a central db anymore.

Three: Smaller objects

You mention your objects are an account, with transactions. To me it seems like you should never be changing any of those database rows. Only adding new transactions to an account.

If you can refactor your objects to be immutable, then you remove all your UPDATE sql and only have INSERTS, which is much more scaleable and deals better with concurrent processing.

1

Disclaimer: Please, take the things I say here with a grain of salt. Many of the things I mentioned depend a lot on the business you are modelling, not the stack or the technique. The second must serve the first, not the other way around.


Why do we update the whole withdrawal collection? It doesn’t make sense

To avoid the need for merging or synching changes and also to preserve the order and status of items as is in runtime.

The domain component is supposed to be always consistent and valid in runtime, so a simple solution to preserve this state in the database is by inserting or updating it as is, here and now in a given runtime.

The problem with this approach is that it ignores concurrency. Concurrency control is expected to happen somewhere out of the domain.

The same for the account. No change was made to the row, yet we are updating the whole row.

Repositories are often implemented as dumb collections (add, update, remove, clean). They are coded to be agnostic to the scope of the changes. Then, ORMs become mere row/statement mappers.

For this reason, when it comes to more complex scenarios (like yours), I find repositories to fall short. I think patterns, like Unit of Work are more suitable for the job.

In a nutshell, UoW tracks (in memory) every change applied to the given components. Somewhat an in-memory, shot-living op log. Each “log” is mapped or translated into a single and concrete statement. When the UoW is committed, statements are executed one by one, in the entry order and within a single transaction1.

A single error during the UoW execution will result in rollbacks or no changes in the data source. Components in the runtime will be discarded and forgotten.

The “difficulty” for those implementing dumb repositories with ORM as mappers is how to conciliate the framework or the ORM with the UoW because it’s the UoW who is now responsible for the transaction management.

When another flow could be modifying the account row and have its updated value overwritten

One possible solution could be segregating reads and writes. But fear not, you don’t have to build 2 different code bases. Instead, you could enable a read-only | write mode feature flag and deploy as many read-only as you need, but only 1 for writes. Then route “writing” requests to the single writer2.

However, enabling a single writer won’t do much if it can’t enqueue and process requests in the due order. The writer needs either to handle one request at a time or/and be able to reorder the requests as they arrive.

I don’t dare to say which one is better for you, but you could achieve both by:

  • Limiting the number of threads/workers handling commands. Not a problem for you, since NodeJS is mono thread, so keep it that way.
  • Implementing a priority queue (Min Heap?). A possible premise to sort the queue could be the dates. Regarding dates, it would be good to keep when the command is sent (by the client), not only when it was handled by the writer (server). The idea is to be able to synch sent dates with different timezones since the server will always be in the same timezone (most likely UTC). But, this alone won’t solve the use case where N requests match the “priority” due to matching dates but a stringent duplicate policy can help you out with it. The first to land in the queue wins, the rest are sent to a dead letter queue so you can handle the rejection gracefully.

Notice that some conflicts can’t be resolved by code. Two transactions competing for the same balance happening at the very same time might result in an irresolvable conflict. In those cases, you might need compensation transactions to resolve the “fight”.

Bear also in mind that compensation transactions are not necessarily automatic or/and programable, they may require someone to intervene.


1: Additionally, we could save the op log as well. For traceability

2: Say, by infrastructure: balancers, gateways, DNS, …