Jit Optimizations: Inlining (I)

Inlining is an optimization that can happen when you have calls. The optimization consists in substituting the call for the code of the callee.

What do you gain from doing this?

 - Speed: Reduce the overhead of the call. To get to the actual instructions that do the work, we have to do the following:

 - Setup arguments (in registers and the stack).
 - Do the actual call instruction.
 - Callee prolog
 - Callee work
 - Callee epilog
 - return

- Size: sometimes you get smaller code, as there are situations (for example property getters) that the number of instructions generated when inlining is less than the number of instructions that is generated to do an actual call.
 
- Expose more optimizations (speed and size wise), as now the code you got from substituting the call can be the target of other
  optimizations.

Let's take a look at a simple example:

class Test
{
static int And(int i1, int i2)
{
return i1 & i2;
}

static int i;
static public void Main()
{
i = And(i, 0);
}
}

If we don't inline And(), the generated code for Main() is:

        mov ECX, dword ptr [classVar[0x2ca0d24]] ; setup first argument (i)
xor EDX, EDX ; setup second argument (0)
call [Test.And(int,int):int] ; do the call
mov dword ptr [classVar[0x2ca0d24]], EAX ; assign result to static
ret ; return

And the code for And() is

        and ECX, EDX ; perform AND
mov EAX, ECX ; setup return register
ret ; return to caller

Note that And() is really simple, and we're not paying anything for prolog and epilog. Even with this, we still get a win when inlining:

Main() with inlining turned on:

      xor EAX, EAX ; generate final result ;)
mov dword ptr [classVar[0x2ca0d24]], EAX ; move result to static
ret ; return

For this really simple code we got the following from the inlining optimization

- Went from 8 to 3 instructions for the codepath we have to execute, we'll be much faster.
- For Main(), we went from 20 bytes to 8 bytes of code.

Note how inlining enabled a further optimization, when we inlined we could actually fold away the And operation, as the JIT realized one side of the AND was a 0.

In my next post, I will go into more details of our current implementation, such as some limitations we currently have.

PS: I'm saving all the questions you guys are asking, I'll eventually answer (I hope most of them) on the blog, just have a bit of patience, we're really busy right now trying to get Beta 2 done!.