Coding in Marble

I wish I could remember where I first read it because perhaps it deserves attribution.  But many years ago I read about the two world views of physicists and they resonated with me.  One world view is that prescibed by things like General Relativity and Maxwell's Equations.  These have, in some sense a great mathematical beauty to them.  This is the "the universe is made out of marble" viewpoint.  Nice clean fields.  Very solid.  Then there's the other perspective, this is the perspective you have in systems like Quantum Electrodynamics.  There are lots of particles flying all over the place and probabilities.  Lots of statistics.  The universe is a messy organic place.  It might have been tempting to take those people and lock them away or something but darn it if they didn't keep getting great results.  So Marble and Wood.  The highly regular vs. messier put powerful.  I soon started using this analogy in computer science and that's what I'm writing about today.

One place this comes up is when programming with promises, or really any notification system even ad hoc ones.  In the case of a promise the situation is that you have some work that needs to be done "later" after some asynchronous operation has completed:  let's say that you are waiting for some data and when it arrives you're going to validate and extract some values from it, it doesn't much matter.  The natural way to do this with a promise would look something like this:

p.then( () => {do your work} )

or in javascript you might write

p.then( function() { do your work } )

In fact you can keep combining these things, if you have more work that has to be done after the first batch asynchronously completes you might end up writing something like this:

p.then( () => {do your work} ).then( () => {do more work}).then( () => {even more work});

Now the next thing that's going to happen is you're going to put this in a loop for your various 'p' read operations that need to happen and presto magico your application is done.  And you've fallen into the wood trap.

The reason that I try to avoid this pattern is that what's happened now is that on every iteration you're creating delegate objects and new promise objects that connect the various stages for each operation.  But the thing is they're all the same!  Despite the fact that all the objects are being handled identically (typical) we get no savings, we are using the most general dispatch mechanism to dispatch the very same code thereby creating far more garbage than is needed.  Of course each one of these promise chains encourages the next guy to keep doing the same thing.  You could even add stages that merge, select, batch, whatever you need, more stages, more wood.

The hallmark of the wood pattern is that the same chain of dependencies/computations is re-represented in data repeatedly -- achieving no savings for its sameness.  It is the most flexible choice, each datum could be handled seperately in a unique fashion but they probably are not.

How can you make it better?

Well of course a different pattern alltogther might be the right choice, something that looks more like this:

datasource.handler += () => { do your work }

or maybe

w1 = new {worker stage one object }
w2 = new {stage 2 handler};

datasource.handler += w1;
w1.stage2 += w2;

Now this isn't as flexible but it is much more economical.  With one fell swoop we're saying that they are all going to be handled symmetrically and we do not have allocation cost per datum.

Even if that kind of refactoring isn't possible just this simple thing might be very helpful.

var d1 = () => {do your work};
var d2 = () => {do more work};
var d3 = () => {even more work};

p.then(d1).then(d2).then(d3);  // this in your loop

This second form avoids creating a whole new delegate/closure in each loop iteration so in some sense it's at least partly marblized. 

Now of course this kind of transform isn't universally possible but if you start by saying "what are the rules for all my data" and then encoding that then you tend to end up in a place where you have low-to-zero overhead costs for each item and you just do the work.  If you don't you can easily end up in a place where there most flexible pattern is being used in a very regular way, at great cost.

I've often admonished people to use the simplest programming technique that will do the job to get the best performance.  Specifically, don't use virtual methods if non-virtual will do.  Don't use interfaces if virtuals will do.  Don't use delegates if interfaces will do.  This discussion has in some ways been a rehash of that point.  But then I always found that grounding a concept in an example is helpful so hopefully you all found it worth the read.

P.S. I'm putting the "wood" programming technique in a bad light here but it has its place.  And that doesn't mean I'm down on "wood" physics.  I happen to think QED is every bit as cool as cool as relativity :)