What About Today's Gigacore Applications?

As should be clear from the previus post, I'm not a big fan of the threads + locks model. An alternative approach is the one offered by message-passing, which is used every day in distributed applications and with enormous success. The world-wide web is one giant concurrent application, if you will: millions of processors are concurrently processing data all across the world, some trivial and some enormously important. The success of this unbelievably complex and humungous application lies in its simple composition model, built around the very simple message-passing protocol we all know as HTTP.

We should ask ourselves whether there are lessons to be learned from this unparalleled success story, lessons applicable to the smaller end of the concurrency scale.

Would You Pass Me the Data, Please!

Rather than mingling the data that is shared between concurrent computations and trying to protect it with locks, the idea in message-passing is to bring the shared data to a computation that needs it, which is responsible for passing it on after it's done with it. A computation may just need once piece of data, (e.g. a bank account record), or multiple pieces of data, (e.g. an account record and a credit report). Either way, the computation does not start until the necessary data is available, but once it is made available in a message, it is not accessible to any other computation.

Some systems, such as the CCR, which I mentioned earlier, take this approach: concurrency is expressed in terms of data flow (DF), where buffers transport data between tasks which are short-lived and produce their results as messages that are sent to other tasks. Each task is started only when all its data is available, which is very efficient from a runtime perspective. Consider, for example, a naively coded control-flow-based application that is waiting for data from three sources (src1, src2, src3):

 int x1 = receive(src1);
int x2 = receive(src2);
int x3 = receive(src3);

(Here, 'receive' is used as a placeholder for a generic operation waiting for data to be available -- it could be something as simple as a file I/O read() call).

In the example, the thread would go to sleep during the first call, keeping a stack to retain the calling context for the return. To be fair, the code is very easy to read and understand, but it does suffer from overhead issues. If the data isn't immediately available and comes in in the order listed, the thread will block three times and unblock three times, for a total of six context switches.

On the other hand, in a DF-based system, the same code may look something like:

 Join.Create(new Source [] { src1, src2, src3 },
delegate (int x, int y, int z) { ... } );

This code simply attaches a delegate to the availability of the data and then moves on without waiting for it. If there is no more work to be done before the data is available, the task containing this statement returns and is done. The continuation of the task is dependent on the data being available. This is very efficient there are only two context switches involved and since no stack is necessary while waiting (there is no calling context to retain) and stacks are typically very costly. Furthermore, the context switches themselves are far less expensive as they don't have to involve a trip into the OS kernel (this doesn't follow from anything I've said, but it's true).

On the other hand, such code is hard to construct and read unless you spend a lot of time learning it. It's a fundamentally different way of looking at the encoding of an algorithm, a pivoting of the pattern, if you will, and it may not be right for your coding style. A compromise is to wait synchronously as in the first example, but to wait for all data at once and only context switch twice (once when blocking, once when unblocking):

 Join.WaitFor( new Source [] { src1, src2, src3 } );

This is less efficient than the callback method above, since it holds on to the stack, but it is still better than the three waits in sequence. If you're running on a Windows GUI thread, though, it's bad for another reason: application responsiveness suffers, which is much worse than having a few extraneous context switches.

Message-passing is per se not sufficient to replicate the WWW scalability success story -- we need a couple of other elements, too: isolation and loose coupling. More about this later.