Jit Optimizations: Inlining (II)

In a JIT compiler, inlining can become an expensive optimization (compile time wise): it can involve loading other classes or assemblies, doing security checks, etc... What's worse, even after doing all this expensive work, you may find out that the candidate for inlining wasn't really worth it, so you just have to throw away work you did, wasting not only time, but possibly affecting other things like working set (you loaded a class that you didn't really need).

Another reason the JIT has to be smart about what it inlines is due to how the JIT compiler works. To give you an idea, consider a function f() that will generate an optimal solution for a problem in O(N^2) steps and a function g() that solves the same problem, but not optimally, in O(N) steps. If you have a limited time to find the solution, a good approach could be doing the optimal solution for small Ns, and fallback to the non optimal, but fast, solver for larger Ns. In our case, the problem is generating good code, and N is a measure of the complexity of our input (code size, complexity of flowgraph, number of variables etc...). What does this have to do with inlining? Well, with inlining you are just making that N number bigger, which can result in us crossing the line (which in practice is not as well defined as in my example) that will make us generate less optimal code. This is a problem the VC team encountered with their IL generation (they do a lot of optimizations at an IL level, among others, inlining and they found out that very aggressive inlining was hurting the quality of the code our JIT generates)

From an engineering point of view, it makes sense to aproach this optimization with a 'Best Bang for your Buck' attitude, which means spending your compile time resources (time and space) and developer resources covering the most common cases where there is benefit inlining, but not a big risk of making things worse. A typical example of a really good candidate for inlining is a property getter/setter. These are usually really small methods that usually just do a memory fetch or store, so it's usually a size and speed win to inline them.

These are some of the reasons for which we won't inline a method:

- Method is marked as not inline with the CompilerServices.MethodImpl attribute.

- Size of inlinee is limited to 32 bytes of IL: This is a heuristic, the rationale behind it is that usually, when you have methods bigger than that, the overhead of the call will not be as significative compared to the work the method does. Of course, as a heuristic, it fails in some situations. There have been suggestions for us adding an attribute to control these threshold. For Whidbey, that attribute has not been added (it has some very bad properties: it's x86 JIT specific and it's longterm value, as compilers get smarter, is dubious).

- Virtual calls: We don't inline across virtual calls. The reason for not doing this is that we don't know the final target of the call. We could potentially do better here (for example, if 99% of calls end up in the same target, you can generate code that does a check on the method table of the object the virtual call is going to execute on, if it's not the 99% case, you do a call, else you just execute the inlined code), but unlike the J language, most of the calls in the primary languages we support, are not virtual, so we're not forced to be so aggressive about optimizing this case.

- Valuetypes: We have several limitations regarding value types an inlining. We take the blame here, this is a limitation of our JIT, we could do better and we know it. Unfortunately, when stack ranked against other features of Whidbey, getting some statistics on how frequently methods cannot be inlined due to this reason and considering the cost of making this area of the JIT significantly better, we decided that it made more sense for our customers to spend our time working in other optimizations or CLR features. Whidbey is better than previous versions in one case: value types that only have a pointer size int as a member, this was (relatively) not expensive to make better, and helped a lot in common value types such as pointer wrappers (IntPtr, etc).

- MarshalByRef: Call targets that are in MarshalByRef classes won't be inlined (call has to be intercepted and dispatched). We've got better in Whidbey for this scenario

- VM restrictions: These are mostly security, the JIT must ask the VM for permission to inline a method (see CEEInfo::canInline in Rotor source to get an idea of what kind of things the VM checks for).

- Complicated flowgraph: We don't inline loops, methods with exception handling regions, etc...

- If basic block that has the call is deemed as it won't execute frequently (for example, a basic block that has a throw, or a static class constructor), inlining is much less aggressive (as the only real win we can make is code size)

- Other: Exotic IL instructions, security checks that need a method frame, etc...