Kind of a functional programming newbie question here:
I’ve been reading the transcripts of some of Rich Hickey’s talks, and in several of his more well-known ones, he recommends using queues as an alternative to having functions call one another. (E.g. in Design, Composition and Performance and in Simple Made Easy.)
I don’t quite understand this, in a number of respects:
-
Is he talking about putting data in a queue and then having each function use it? So instead of function A calling function B to carry out its own computation, we just have function B slap its output on a queue and then have function A grab it? Or, alternatively, are we talking about putting functions on a queue and then successively applying them to data (surely not, because that would involve massive mutation, right? And also multiplication of queues for multiple-arity functions, or like trees or something?)
-
How does that make things simpler? My intuition would be that this strategy would create more complexity, because the queue would be a kind of state, and then you have to worry “what if some other function snuck in and put some data on top of the queue?”
One answer to an implementation question on SO suggests that the idea is creating a bunch of different queues. So each function puts its output in its own queue(??). But that also confuses me, because if you’re running a function once, then why does it need a queue for its output when you could just take that output and slap a name on it as a (var, atom, entry in a big hash table, whatever). By contrast, if a function is running multiple times, and you stick its output onto a queue, then you’ve inflicted state on yourself again, and you have to worry about the order in which everything is called, downstream functions get less pure, etc.
Clearly I’m not understanding the point here. Can someone explain a little?
5
It’s more of a design exercise than a general recommendation. You aren’t usually going to put a queue between all your direct function calls. That would be ridiculous. However, if you don’t design your functions as if a queue might be inserted between any of the direct function calls, you cannot justifiably claim you have written reusable and composable code. That’s the point Rich Hickey is making.
This is a major reason behind the success of Apache Spark, for example. You write code that looks like it’s making direct function calls on local collections, and the framework translates that code into passing messages on queues between cluster nodes. The kind of simple, composable, reusable coding style Rich Hickey advocates makes that possible.
4
One thing to note is that functional programming allows you to connect functions to each other indirectly through mediator objects that take care of procuring arguments to feed into the functions and flexibly routing their results to recipients that want their results. So suppose you have some straightforward direct calling code that looks like this example, in Haskell:
myThing :: A -> B -> C
myThing a b = f a (g a b)
Well, using Haskell’s Applicative
class and its <$>
and <*>
operators we can mechanically rewrite that code to this:
myThing :: Applicative f => f A -> f B -> f C
myThing a b = f <$> a <*> (g <$> a <*> b)
…where now myThing
is no longer directly calling f
and g
, but rather connecting them through some mediators of type f
. So for example, f
could be some Stream
type provided by a library that provides an interface to a queueing system, in which case we’d have this type:
myThing :: Stream A -> Stream B -> Stream C
myThing a b = f <$> a <*> (g <$> a <*> b)
Systems like this do exist. In fact, you can look at Java 8 streams as a version of this paradigm. You get code like this:
List<Integer> transactionsIds =
transactions.parallelStream()
.filter(t -> t.getType() == Transaction.GROCERY)
.sorted(comparing(Transaction::getValue).reversed())
.map(Transaction::getId)
.collect(toList());
Here you are using the following functions:
t -> t.getType() == Transaction.GROCERY
comparing(Transaction::getValue).reversed()
Transaction::getId
toList()
…and instead of having them call each other directly, you’re using the Stream
system to mediate between them. This code example isn’t calling the Transaction::getId
function directly—the Stream
is calling it with the transactions that survived the earlier filter
. You can think of the Stream
as a very minimal sort of queue that couples functions indirectly and routes values between them.