Know when a resource has been fully consumed

  softwareengineering

I’ve got a race condition problem I need to solve with a distributed API/service and would appreciate vetting my plan and any fresh ideas I have missed.

Problem:

I have a finite amount of many (hundreds) different resource and need to know when each has been used up. The resources are consumed via API calls to a distributed service. Each call should decrement the appropriate resource counter by 1. When an API call attempts to decrement a counter past 0, it needs to return an error and keep the counter at 0. I need to avoid race conditions.

My Solution
Here’s what I’m thinking, but I’d appreciate any feedback….

  • Each API request is routed to a queue.
  • Another service (resource service), with only a single instance, pick up events from the queue one at a time.
  • For each queue event, it checks if the given resource’s counter in redis is >0 and if so decrements it.
  • Resource service calls back to the calling API to inform it if it was able to consume the resource or not

I feel there must be a simpler way, but I’m not seeing it. I can expect to have ~ 500 – 1000 different resources at any given time so I’m thinking a queue per resource isn’t feasible.

3

I see basically two types of approaches.

One is to implement (what essentially amounts to) transactions. Have the distributed services contact (or go through) a central authority to check availability. Whether that is a traditional relational database, redis, ES, some single-instance service sitting on a queue, whatever.

The drawback is that this synchronizes the distributed services. Which kind-of defeats the purpose of a distributed system.

The other type of solution is to partition the resources and have each resource be handled by a single service. Then on the single service for that particular item you can manage its inventory safely. This is what Kafka does basically.

The drawback is that it is slightly more involved to do, and it only works if all items are processed with a similar resource usage, and are used somewhat uniformly. Otherwise the load does not get properly distributed.

Oh, and of course there is “apology-based computing”. Just do what you can and deal with errors another way. Basically “apologizing” for items you can not deliver 🙂 This is actually used quite often, even at banks with real money on the line. 🙂

1

This is a common problem that has a really large set of potential solutions. Essentially there are two basic classifications of approaches: optimistic locking and pessimistic locking. In this context, the optimistic solution is that you assume that there is enough stock available and attempt to decrement the counter and error out if there isn’t. The pessimistic solution is to allow items to be reserved, and once confirmed it is available, you allow it to be purchased. Both approaches have pros and cons. Based on your description it seems you are going down the optimistic path.

I’m not terribly familiar with Redis but as I understand it, it implements eventual consistency as described here. This might be problematic if you are distributing this across multiple instances. For example, if two queue consumers pick up a message to buy a mac book at nearly the same time, they could both see one available in separate redis nodes, both decrement that value and see it at zero after they are done. It’s not clear to me whether redis would see this as inconsistent or not later.

This thread talks about ways to make redis implement strong consistency across nodes. Another option is to use a datastore with built-in support for strong consistency across instances.

I would suggest an alternative solution.

Have the central counters yes, but have the distributed APIs pull blocks of resource for use, say 500 each time. This allows each instance to maintain its own count while the block lasts and reduces the calls to the central counter.

The flow would be

  • User calls API API local resource > 0 ? return resouce, decremen local count
  • API calls central for new block
  • central count > 0 ? send block : send run out msg
  • API updates local count, returns a resource or an error.

The downside is that when you run low, some instances might still be returning resources while others have run out. You could mitigate this by having the central resource keep track of which apis have run out and putting request back on a queue

LEAVE A COMMENT