Lost in Translation

A lot of the work that goes on inside a compiler is a bunch of translations that take the high-level conceptual idioms and turn them into stepwise instructions that are executed by the machine.  Of course, this is often theoretical as many of these high-level constructs are nothing more than contextual sugar sprinkled over their low-level counterparts.  The water gets even muddier when you start to consider managed languages running atop the CLR.  The compilers for the runtime don’t even generate machine instructions at all.  They generate intermediate language instructions, or IL, that is meant to represent a virtualized, idealized processor.  The runtime then does the final transformation when the code is first executed in a process we lovingly refer to as the JIT.


The purpose of this one-two punch is to be able to leave the nitty-gritty details of the actual execution environment to the last minute, so the same binaries can execute on a variety of different (though similar) platforms. The side effects of this choice have a plethora of potential benefits that would take too much time dive into here.  One of the numerous things done during the final translation step is to optimize the machine instructions against the actual hardware.  Of course, this also leaves open the possibility that the code can be optimized based on a variety of factors, including past usage patterns; all of this because the instructions are not fully baked by the compiler.  The IL instructions are little more than semantic suggestions.  This leaves the runtime with the freedom to do its magic.


This is all well and good, but why should it stop there?  If it is better to keep more of the semantics lodged in the binaries, why do we go to the trouble to translating the language idioms at compile time at all?  Instead of turning a single meta-concept into a series of primitive steps, why don’t we just encode that original semantic and be done with it?  Instead of losing that information during translation, it should become part of the information of the code.  The runtime environment could then handle this last step as well.  If we are willing to take the runtime cost of translation (including optimization) in the first place, what are the extra few cycles it will take to convert a switch statement into a series of tests and branches?


Too much baggage for the runtime to handle?  Too many concepts to burden the loader with?  Not so.  Just make these dynamic, like any other runtime component.  Semantic notions, like method bodies, can be bound at compile time, loaded at runtime and employed during final translation.   That’s right, you could engineer code that manipulates code, just like a compiler, but done piecemeal just like methods are written to focus on only a bit of an overall process.


This is probably what the Intentional Programming folks were trying to do here at Microsoft, though that project never made it out the door.  And it’s probably similar to what this Aspect Oriented programming thang is, though to tell you the truth I just don’t grok that one at all.


Of course, this would turn compilers into nothing more than syntax parsers and reference binders, the piece that takes text and turns it into code.  Code would therefore be more than mere steps; it would become the ultimate expression of the idea. 


Real code, not pseudo code.



Comments (1)

  1. I wouldn’t say it’s like AOP; but it does remind me of a GNU project once upon a time. The basic idea was for portability between systems (like bytecode) but instead of "compiled code" it was merely a (serialized) Abstract Syntax Tree.

    From memory, they just output the AST after parsing and type checking on the "source" and on the "destination" fed that into the compiler backend to produce the final executable.