Is DynamoDB optimistic locking with one version attribute for each field in an item a valid design pattern?

01/11/2022 softwareengineering

It’s common to implement optimistic concurrency control in DynamoDB by giving each item in the database a top-level “version” attribute and only allowing an update of an item to succeed if the current version matches the expected version (see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.OptimisticLocking.html).

Suppose I want more fine-grained control, so that if user A updates fields 1, 2, and 3 and user B simultaneously updates field 4, 5, and 6, neither user has to retry. Are there known implementations or discussions of DynamoDB optimistic locking in which each attribute has its own version field?

What you’re trying to do is in principle possible. Limitations:

You will have to write the necessary queries yourself, and will not enjoy support via your SDK.
It becomes a lot more difficult to track per-field dependencies. You might only want to perform a field update if certain other fields are unmodified.

Examples of potential read-write conflicts

Your example potentially runs into this second limitation. Consider that the two processes/users want to apply a transaction that has a read-set and a write-set. The read-set describes which fields were accessed to determine the output of the transaction, the write-set contains the actual updates.

In your example, if the two processes have an empty read-set, then they do not conflict:

process	read	write
A	{}	{1 → A1, 2 → A2, 3 → A3}
B	{}	{4 → B4, 5 → B5, 6 → B6}

They can be applied in any order and will not conflict. The resulting update will be {1 → A1, 2 → A2, 3 → A3, 4 → B4, 5 → B5, 6 → B6}.

If one process has read elements of the other process’s write-set, then they may or may not conflict depending on order:

process	read	write
A	{}	{1 → A1, 2 → A2, 3 → A3}
B	{1}	{4 → B4, 5 → B5, 6 → B6}

If A runs first, then B will fail and the update is {1 → A1, 2 → A2, 3 → A3}.
If B runs first, then A can run afterwards and the entire update is as in the previous scenario.

Depending on the consistency mode used for the reads (eventual consistency or strong consistency), the transactions might fail to apply even if they don’t conflict, because the read-set references stale data.

If both processes have read fields that the other process writes, then either transaction can be applied and the other will fail. For example:

process	read	write
A	{4}	{1 → A1, 2 → A2, 3 → A3}
B	{1}	{4 → B4, 5 → B5, 6 → B6}

Checking field version vs checking old value

On an implementation level, you probably wouldn’t add a version attribute for scalar fields. I.e. you would probably not create a document like this:

{
  "field1": "some value",
  "field1version": 1
}

Instead, you would typically assert that the read fields still contain the value that you read. Thus, your update queries would look like the following (showing the last example for process B, in an SQL-like syntax):

UPDATE collection
SET field4 = 'B4', field5 = 'B5', field6 = 'B6'
WHERE field1 = 'old value for field 1'

But a version number is a more generic approach that is safer. For example, it may be relevant to your application if a field was touched at all, even if it later contains the original value again. For example, consider the following sequence of reads R(value, version) and writes W(value, version):

A: R(1, v1)                 R(1, v3)
B:         W(2, v2) W(1, v3)
   ----------------------------------> time

Here, process A reads value 1 from a field. Another process overwrites that field with 2, then again with 1. A later read by process A returns 1 again. For applications that only care about the current value, this approach is sufficient. Other applications may care about how often that field was written, and would not consider those two reads to be equivalent.

Dependency tracking vs actual locks

That said: while thinking about reducing conflicts can be great, sometimes such fine-grained dependency tracking just isn’t worth it. This depends very much on your specific data model and on your scale.

Optimistic locking is great because it can give you high throughput – but only if you expect few conflicts, and failures to apply an update can be handled gracefully (such as by re-starting an unit of work).

If you expect a lot of conflicts then using a database that supports actual locks might be preferable (e.g. an RDBMS where you can BEGIN TRANSACTION).

LEAVE A COMMENT Hủy