The CLR x86 JIT, an overview

I'll be doing a series of articles on how the x86 CLR JIT compiler works and the different optimizations it does for you.

 

Whenever a method is about to be executed for the first time, the VM realizes it doesn't have native code for it, and invokes the JIT to generate it (if you are curious about how it works and have the Rotor source code, go to CallCompileMethodWithSEHWrapper in jitinterface.cpp and backtrack from there).

 

Although it is designed to be a fast compiler (compiling is happening at runtime, you can't keep the user waiting) and a lot of trade offs, design decisions an heuristics are in place to make this happen, the JIT really doesn't look that different from a 'normal' compiler (where 'normal' means as per the Dragon Book ;)

 

Work is roughly divided into the following stages:

 

1. Importing: In this stage, the input IL is transformed to the JIT's internal representation. Verification (if required) also happens here. A lot of communication happens back and forth between the JIT and the VM, as the JIT has to ask a lot of questions to the VM (for example, if the IL has a LDFLD instruction, it has to make sure the VM loads the class, and ask how it can access the field, directly? need a helper (for eg MarshalByRef classes), what helper?).

 

2. Morphing: Here, the compiler just applies a transformations to our internal structures in order to simplify or optimize the code. Things that happen here include detecting JIT intrinsics (functions the JIT has special knowledge about, such as Math.Sin(), which will end up being an fsin x86 instruction), constant propagation, inlining, etc...

 

3. Flowgraph analysis: The JIT performs a traditional flowgraph analysis, to determine the liveness of variables and gen/kill sets, dominator information, loop detection, etc.... This information is used in all subsequent stages.

 

4. Optimization phase: In this stage, the heavyweight optimizations happen: Common Subexpression and Range Check Elimination, loop hoisting, etc...

 

5. Register allocation: Registers are a scarce resource on x86. Operations performed on registers are generally faster than those done on memory, hence its important to make variables live in registers. This stage tries to select the most optimal placement for each variable. To do this it takes in account the number of uses of each variable in the method. It also makes a decision about what type of frame the method will use (EBP or ESP based, will the frame be double aligned), etc...

 

6. Code generation: We finally get to generate the x86 code. Here we take in account what processor we're generating code for, in order to choose the best idioms for it. GC and debugger information is also recorded here.

 

7. Emit: This is the last stage, at this point we have the x86 code the method will use and information to generate the GC and debugger information. Everything will be packaged together and returned to the VM, which will then redirect control to the newly generated code.