My colleague Vance Morrison wrote an internal paper on code quality issues in our current system. I thought there were some excellent items discussed in his paper so with his kind permission I’ve edited/summarized it for a general audience. Thank you Vance.
Qualitative Code Differences in Managed/Unmanaged Code
If you were to compare the assembly code of an equivalent managed and unmanaged program, you would find the differences break down into three broad categories: Intrinsic Features, Optional Features, and JIT Compiler Limitations.
In contrast, things like local variable access, argument access, flow of control, method calls, instance field accesses, as well as all primitive arithmetic are largely unchanged in managed code. This is very nice since this is the heart of most performance-sensitive programs. So there’s a great base for raw computation problems. See Jan Gray’s paper “Writing Faster Managed Code: Know What Things Cost”
Intrinsic Runtime Features
These are the things that don’t exist in the unmanaged world, such as garbage collection (GC), appdomains, and code-access security. This is the most worrisome set of differences between managed and unmanaged code because you really can’t “opt-out” of these features – they represent the intrinsic cost of using the runtime.
- GC Information – To do a garbage collection, all pointers to the GC heap must be identified (and possibly updated). This includes all pointers on the execution stack (local variables, arguments, register spills) for every thread in the system, as well as any pointers in CPU registers themselves. This requires the JIT compiler to generate GC tracking information sufficient to walk the stack at (roughly) arbitrary times. This extra information is most fundamental difference between unmanaged and managed code. While GC tracking requirement does not affect code quality at all, it does mean that every method has a table associated with it that is typically 15% the size of the code (on x86). Luckily, this table is only accessed for methods active during a GC, so it generally has a small affect on working set. There is also a small working set overhead (~ 1 DWORD per method), to link the method and its GC information. The good news is that all of this has no effect at all on code quality and only a small effect on working set.
- Write Barriers – The runtime uses a “generational” GC which improves GC performance by only collecting part of the heap most of the time. To implement this every write of a GC pointer that resides in the GC heap needs to be logged as a potential root of a partial GC. This bookkeeping adds an additional 4-10 cycles for every such write in the common case, see “Garbage Collector Basics and Performance Hints.” Write barriers are a concern, but the overhead is not huge. A pointer write goes from about 1 cycle to on average of 6 or 7 cycles), for pointers on the GC heap – and the hottest pointers are typically on the stack where there is no penalty at all. The effect of write barriers is often measurable (a few percent or so), and can be more significant in certain tight loops.
- Static Field Access – The runtime supports a lightweight process-like environment called an AppDomain. Each AppDomain has its own copy of all static variables. Because of this, any domain nuetral code must use 5-10 instructions to access static fields of just 1. The JIT can optimize many cases (allowing one fetch of AppDomain variables to serve many static field fetches in the same method), but there are cases when no optimization can be done. Domain Nuetral code is more common in Whidbey. Static field access overhead is actually worse than write barriers in the worst case: the static field access goes from one cycle to roughly ten cycles. However because the overhead of field fetches can be combined (and pulled out of loops) the impact of slower field fetch is generally less than that of write barriers. It has no measurable impact at all in many scenarios (for instance the framework code tends to not use static much).
- Interop with existing unmanaged code – Transitions to unmanaged code minimally must be marked on the stack to allow garbage collections to happen correctly, and there can be security checks and/or argument conversion necessary (if the types don’t exactly match operating system type). In the best case (no security concerns, simplest kind of call) the overhead is 10-20 instructions. Costs can increase dramatically if argument conversion is needed. .
These are features that developers can avoid if they wish to, though for the most part we encourage developers to use them universally. (e.g. array bounds checks, run time casts). These features can be avoided in particular cases if needed (e.g. by using “unsafe” code).
Ease of use, safety, and simplicity are weighted heavily in making design decisions for most managed code users, including our framework, so most code takes advantage of these “optional” features as a matter of course. Where these costs are hard to bear because the code is highly performance critical you can opt-out if necessary. Opting out with due caution is our normal recommendation.
- Managed code strongly encourages code to be verifiably type safe (which means the CLR can prove all references are to instances of the statically declared type). This leads to a bunch of small overheads that can add up.
- Bounds checks on many array accesses (by default, every access has a length check at the cost 2 instructions). You can opt-out by using unsafe code.
- Type checks on every set to an array of objects to ensure that the value being set is compatible with the array being updated. You can opt-out by using unsafe code.
- Type checks when extracting data from type neutral containers and APIs. You can opt out by using unsafe code.
- Boxing (wrapping a primitive type in an GC heap object) when inserting primitive types into type neutral containers and APIs. You can opt out by using generics or generating a container for the specific primitive type.
- Non-mutable strings. The basic string type is not mutable, which often means more data copying (but sometimes less). You can opt out of this by manipulating character arrays or special classes like StringBuilder, but when you interface with APIs that expect strings, you need to make a copy.
- Delegates. Managed code has type-safe notion of a function pointer called a delegate. Delegates are more powerful then C function pointer because they carry state, and can dispatch to multiple targets. This increases overhead. You can opt out by using unsafe function pointers.
- The runtime has an extensive set of reflection APIs that allow code to introspect on the running code. It is relatively easy to probe for types at runtime, traverse inheritance hierarchies, set fields by string name, call methods by string name, and even generate new methods on the fly. These are powerful features (really not available at all in the unmanaged world), but have a significant cost compared the precompiled code. A careful engineering tradeoff has to be made by the users of these features to ensure the benefit of this introspection is worth it.
- Managed code tends to have more extensibility points than the equivalent unmanaged counterpart. Developers use object oriented techniques, using virtual functions, interfaces, and the reflection APIs to achieve this. These extensibility points can cost significant amounts of performance and have to be carefully weighed by framework designers.
- Managed code supports Custom Attributes on IL entities (Types, Methods, Fields etc.) This has been valuable for adding new features to the system (e.g. hosting, interop, security, or reliability information) but the attributes are relatively expensive to access at run time. This expense has to be factored into the cost of these new added features.
- Managed code tends to allocate more heap objects (i.e. more methods tend to return new objects rather than modify one that was passed in). Of course reusing objects in place can cut down on the allocation overhead, but, even more importantly, sometimes the locality benefits of nice compact allocations trumps other considerations, and of course managed allocations are more like the speed of a custom unmanaged allocator and not a raw malloc(). So allocation considerations are a subtle topic at best.
- Compilers can make expensive features very easy or even implicit (e.g. transitioning to unmanaged code, anonymous delegates) which magnifies their use tremendously.
- Managed libraries often do extensive precondition checking to give detailed errors on API misuse, for example checking for null object references and returning an ArgumentException. This is great for developers but hurts performance. Obviously this was a choice made by the library designers (end users can’t opt out, except by re-implementing, but library designers can).
JIT Compiler Limitations
The final category of code generation differences are artifacts of the current JIT compiler rather than inherent trade-offs in the managed system.
The current just in time (JIT) compiler is more limited than a typical commercial quality unmanaged compiler, partly because it needs to be smaller and faster and partly because it just isn’t as mature. Some of the larger issues include:
- 64 bit arithmetic – Since the initial implementation have invested heavily in 32 bit integer code quality, but not code quality for 64 bit integers.
- Inlining – The inlining subsystem could use additional work to handle larger inlining cases – this is getting more important as more complex properties become more common and require inlining for performance.
- Analysis caps – For the sake of speed the JIT places arbitrary caps on the size of analysis data. For large methods, the JIT does not have the information necessary to do a really good job.
- Value Types (structs) – Value types are not handled as well as reference types. For example the inliner does not inline function with value type parameters.
- Exception Handling – The code generated for exceptions is based on the assumption that exception handling is rare. This assumption is turning out to be false as users write code with increasingly rich exception semantics.