The CLR x86 JIT, an overview


I’ll be doing a series of articles on how the x86 CLR JIT compiler works and the different optimizations it does for you.

 

Whenever a method is about to be executed for the first time, the VM realizes it doesn’t have native code for it, and invokes the JIT to generate it (if you are curious about how it works and have the Rotor source code, go to CallCompileMethodWithSEHWrapper in jitinterface.cpp and backtrack from there).

 

Although it is designed to be a fast compiler (compiling is happening at runtime, you can’t keep the user waiting) and a lot of trade offs, design decisions an heuristics are in place to make this happen, the JIT really doesn’t look that different from a ‘normal’ compiler (where ‘normal’ means as per the Dragon Book 😉

 

Work is roughly divided into the following stages:

 

1. Importing: In this stage, the input IL is transformed to the JIT’s internal representation. Verification (if required) also happens here. A lot of communication happens back and forth between the JIT and the VM, as the JIT has to ask a lot of questions to the VM (for example, if the IL has a LDFLD instruction, it has to make sure the VM loads the class, and ask how it can access the field, directly? need a helper (for eg MarshalByRef classes), what helper?).

 

2. Morphing: Here, the compiler just applies a transformations to our internal structures in order to simplify or optimize the code. Things that happen here include detecting JIT intrinsics (functions the JIT has special knowledge about, such as Math.Sin(), which will end up being an fsin x86 instruction), constant propagation, inlining, etc…

 

3. Flowgraph analysis: The JIT performs a traditional flowgraph analysis, to determine the liveness of variables and gen/kill sets, dominator information, loop detection, etc…. This information is used in all subsequent stages.

 

4. Optimization phase: In this stage, the heavyweight optimizations happen: Common Subexpression and Range Check Elimination, loop hoisting, etc…

 

5. Register allocation: Registers are a scarce resource on x86. Operations performed on registers are generally faster than those done on memory, hence its important to make variables live in registers. This stage tries to select the most optimal placement for each variable. To do this it takes in account the number of uses of each variable in the method. It also makes a decision about what type of frame the method will use (EBP or ESP based, will the frame be double aligned), etc…

 

6. Code generation: We finally get to generate the x86 code. Here we take in account what processor we’re generating code for, in order to choose the best idioms for it. GC and debugger information is also recorded here.

 

7. Emit: This is the last stage, at this point we have the x86 code the method will use and information to generate the GC and debugger information. Everything will be packaged together and returned to the VM, which will then redirect control to the newly generated code.

 

 


Comments (16)

  1. Lorenzo Dematte says:

    Very interesting. What algorithm did you chose for registry allocation? Why is necessary control flow analisys? I think this phase could be done by the compiler that emits MSIL

  2. Andrew Shuttlewood says:

    How long does all this take?

    I imagine that some bits are faster than others, and you’re only working with a single method, but I imagine it still has an impact of a kind.

    Is there a good guide to .NET IL? I’m (probably far too much) familiar with Java bytecode, and I’m curious as to how .NET differs radically.

    Another question, do you process method by method, class by class, or assembly by assembly?

    Sorry, I’m just very curious 🙂

  3. Some background before the actual question

    Constants (const in C#) are used only by the

    compiler, using the value they have at compile

    time, so a change in some constants is leaving

    us with a version problem (we must recompile

    all depended assemblies)

    Default parameters having the same version

    problem as the caller actually use the compile

    time values.

    The solution that C# team is providing is

    overloads, this is actually a semi-workaround,

    since they don’t capture the developer’s

    intention (which is just some default values)

    and they are maintenance problem (we must

    synchronize the attribute’s definitions and xml

    documentation of the method)

    Now the question 🙂

    How difficult is to add the constant resolution

    in the first steps of JITing ?

    That will solve problem #1 and if the default

    parameters values could be encoded as some

    internal constants (so the caller use them

    instead of the actual values) will solve #2

    also

    I think that JITter already does much more

    complicated things, than a simple Find and Replace

    Regards

  4. Dmitriy Zaslavskiy says:

    David, could you comments on the differences in JIT and NGEN in v2

  5. Eric Wilson says:

    To Panos Theofanopoulos: Why not just use public read-only fields instead of public constants? With read-only fields, I am pretty sure that the compiler does not just hoist the value out at compile time.

  6. Eric Wilson says:

    I’m confused why the JIT does heavy-weight optimizations like common sub-expression elimination. Shouldn’t those be done by the compiler? I can understand the range check optimization because the JIT and CLR already have special understanding of these types.

  7. David Notario says:

    Lorenzo:

    The IL doesn’t have information about control flow and liveness. Certainly, some compilers do it to remove dead code, but because IL doesnt have it, we need to recompute it.

    Andrew: Jitting is per method

  8. Luc Cluitmans says:

    David, can you comment on differences between JITters for different CPUs, and differences between JIT in different framework versions?

    I have noticed a few times that, when doing performance tests (in C#) to decide what architectural principles to use for a problem, and trying two different approaches, one approach may be faster on the Intel (P4) processor in the desktop at my office, while the other approach was faster on my AMD (Athlon XP+) processor in my computer at home.

    Describing the whole test I happened to do this morning leads a bit too far here, but I tried two approaches to a signal processing problem. Both versions were about equally intensive on floating point calculations, but one focussed on employing virtual method calls to get a ‘cleaner’ architecture, while the other approach tried to trick the JIT into optimizing the code as much as possible by using nonvirtual methods (called from a large code-generated method instead of a simple loop) and using method arguments (backed by local variables) instead of explicit state variables.

    I was surprised that on 3 out of 4 test situations (AMD vs Intel and .Net 1.1 vs. .Net 2.0.40607) the ‘clean’ solution was actually (much) faster than the ‘optimized’ one.

    Can you give guidelines on what things to do or not to do to let the JIT do its work optimally? I am thinking about things like the effect of ‘sealing’ a class, virtual vs. nonvirtual methods, interfaces vs. delegates, passing data to a method as fields in its class vs passing the data as method arguments, the implications of ‘ref’ and ‘out’ parameters, etc…

  9. Simple, quick question: According to the times it takes for the JIT engine to emit a method, what is better: many small methods (these stages occur more often and the optimizations are scarse) or few large methods (these stages occur fewer times, but optimizations are heavier)?

  10. Eric : And what’s the use of const then ? 🙂

    The answer is that the static field’s value (even if it’s readonly) cannot be "inlined" by the JITter

    by inlining i mean the following

    class AsmA {

    public const int myConst = 4;

    }

    the statement

    return 4 / AsmA.myConst;

    will generate a

    MOV EAX,1

    and the statement

    return a / AsmA.myConst

    test eax,eax

    jns positive

    add eax,3

    positive:

    sar eax,2

  11. Daniel Moth says:

    Great Stuff! I know you have stated upfront the x86 focus, but I would be interested in any info you have with regards to JITing and the Compact Framework.

  12. ASPInsiders says:

    Just a reminder: Release builds are not Debug builds. Seems obvious, but it’s worth saying again. Release