I’m learning Clojure to see if it’s something I can leverage at my current job, and more importantly, how I can convince my bosses that Clojure has a ‘killer feature’ over java that makes it worth the investment1.
The feature that I’m guessing most clojure evangelists would tout is state management and concurrency.
The example that I see often in blogs and books is an account balance that’s being updated by multiple threads. Clojure’s state management can always cleanly ensure that the balance will be accurate regardless of the number of threads reading and writing that value.
In practice, though, an application would never allow an account balance to live only in memory, it would have to be persisted outside of the JVM (probably in a database) for two hopefully obvious reasons:
- If the app goes down the database will need to accurately reflect
the latest state for recovery. For this reason the database cannot be asynchronously updated, and all reads would have to block until the database update is complete. - If other applications are reading or
manipulating the account balance (maybe because we’ve scaled our app
out to several servers), that balance would need to be kept in sync
between all instances.
Can Clojure’s state management handle situations like this elegantly? For instance, given scenario 2 above, all reads of the account balance would first need to check the database to get the value, and if the database is locked, it would have to block until the correct value is available, no?
It’s great that Clojure handles my in-memory concurrency very elegantly but if that elegance can’t be extended to external state then do I really gain anything?
1(note that things like expressiveness, elegance, etc. are rarely considered killer features by mgmt.)
1
Clojure’s in-memory STM doesn’t handle durable persistence directly (by definition, it is designed for in-memory use case….)
However the principles apply equally well to persisted storage. For example take a look at Datomic, which is effectively a scalable ACID database management system based on Clojure’s data structures and state management philosophy.
The common principles are roughly:
- Immutable persistent data structures
- State transitions managed by pure functions
- Unlimited read scalability (non-transactional reads never need locks)
- “pluggable” transaction semantics to suit your application
- Separation of identity and state
- Emphasis on data separated from functions that process it (i.e. distinct from the OOP concept of encapsulating data and code)
As for the classic bank transfer example, you can perfectly get durability: just send-off
an agent inside the STM transaction. As transactions can get retried, the agent only runs on the last, successful try. The agent would persist the transaction info to a DB.
I reckon that at times, RDBMSs’ better-understood approach of mixing transactions and persistence can be the best bet. So I’d like to introduce a common use case:
Web server apps often feature a global cache (implemented as some data
stucture) in order to avoid DB abuse. Each HTTP request (hence a
thread) could read/write to it, so access needs to be
concurrency-managed somehow. A locking approach can make this ‘cache’
actually a bottleneck!
It’s in this and other similar scenarios where STM really shines.
Actually this looks like of a dupe of question on Stack Overflow. Keeping the question here as well, though, since the question seems more appropriate for this site.
The long and short of this is that for my purposes, since STM does not support the ‘D’ in ACID, Clojure’s state and concurrency management doesn’t buy me anything. Unfortunately.