Whidbey Performance Opportunities


I’m often asked “What’s new in Whidbey” and so I thought I’d put together this (very) brief list of some of the more important items that got attention during this product cycle. This is by no means exhaustive but it’s a taste of some of the nice improvements you’ll see performance-wise.


Ngen for improved code sharing


Possibly the largest single investment area for Whidbey performance has been the drive to decrease the number of private bytes that are associated with precompiled managed modules.  Everett had roughly 40% of the bytes in any given ngen’d module as private.  Whidbey reduces that level to the 10% neighbourhood while at the same time reducing the overall working set of similar modules with better layout.  Overall you may see your private bytes in any given module fall to as low as 1/8th of what they used to be.


The benefits of this are fairly immediate — fewer private bytes means less IO and less memory pressure. Both of which tend to translate directly into improved startup times, both warm and cold.


Note: Beware the Private Bytes counter, it does not tell the full story.  Details to come in another entry “when is a private byte not a private byte”


See also: Jason Zander’s Ngen tutorial


Advanced Ngen features


Hard binding


You can use System.Runtime.CompilerServices.DependencyAttribute to tell ngen that one assembly always requires the other to be loaded.  Opting into this hard link between the assemblies causes ngen to directly code certain offsets from the always loaded assembly into the image it is creating thereby reducing or eliminating the need for fixups.


See also: the MSDN article on Native Image Service


String Freezing


Use System.Runtime.CompilerServices.StringFreezingAttribute to indicate that ngen should include a GC segment that contains all of your string literals pre-created in your assembly.  This further reduces the need for fixups and thereby reduces private pages however an assembly with frozen strings cannot be unloaded so it is a bad idea to set this attribute on assemblies that are “transient” in nature.

Generics


Generics are a double edged sword so I won’t come out and say they are the solution to all your performance woes but they are a great tool to have in your arsenal.  In Whidbey you can easily create strongly-typed value-type based collections that eradicate boxing in many important cases.  Just have a care not to go crazy creating collection types or you may find that you have added far too much code to your project.


See also: Six Questions about Generics and Performance


Enumerators and Foreach


The Generic collections such as List<T> and Dictionary<X,Y> use a superior approach to implementing enumeration than did ArrayList and Hashtable.  If enumeration costs are significant in your application you may get significant savings by using the new collections even if you don’t need the strong typing. 


GC Improvements


A great deal of tuning, bug fixing, and addition of new important hueristics makes the Whidbey GC the best we have ever released.  In addition it is much easier to choose the particular GC mode you want (server vs. workstation, concurrent or not) — you no longer have to use the hosting API to get that flexiblity.


See also: Maoni’s articles on the new GC 


Exceptions Improvements


A variety of improvements in the cost of thrown exceptions have helped some cases, especially those with fairly deep stacks.  Our guidance is still to avoid exceptions on the main path but you might find them somewhat less deadly in recursive algorthims.


Exception Avoidance


Many classes now support exception-free variations such as TryParse which give a return code instead of throwing an error in failure cases.  These are highly recommended in cases where parsing failures can be reasonably expected.


Security


A great deal of work went into the security system as a whole.  From basic things like reducing the cost of startup by improving the security system XML parser to making the declarative security more efficiently. In Whidbey the most common demands — for things unmanaged code access and full trust — are greatly reduced in cost, as are the cost of asserts.  Generally “full trust” performance is excellent with very low throughput overhead.  These overhead reductions especially pay dividends in interop cases that are very chatty.



Strings


There are new string overloads available that make it possible to specify the comparison mode.  See Dave Fetterman’s excellent article describing when to use which type of comparison and be sure to take advantage of ordinal based comparisons (appropriate) for both speed and security.


Reflection


Whidbey includes a major overhaul of the caching policy in the reflection system which results in much more economical overall behavior. This combined with some useful new API’s for getting just what you need can mean great improvements.  See Joel Pobar’s excellent article on the costs of reflection and his insights into best practices as well as his blog generally.


Cross App Domain calls


Marshalling of simple data types (e.g. strings, ints, copyable structs, etc.) between application domains was vastly improved.  In some cases we observed as high as 10x throughput improvements. 


AddMemoryPressure


You can use GC.AddMemoryPressure and GC.RemoveMemoryPressure to inform the collector that you are allocating and freeing unmanaged memory.  The helps the collector to better understand the true memory pressue in your application and collect appropriately.  It’s important to remember that using this API improperly can do a lot of damage because it so directly affects GC behavior.  Be sure to verify improvements with measurements.



Profiling API


Many additional callbacks have been added to make it possible to completely track the lifetime of objects without ad hoc hueristics regarding object promotion and so forth.  As a result finding sources of Mid-Life-Crisis, undisposed objects, and memory leaks generally is greatly improved.  A new CLRProfiler (Beta 2 version) supports these features and of course you can write your own customer profilers that do likewise with comparative ease.



Threadpool


Adaptive thread injection based on throughput and better coordination with GC.

Comments (20)

  1. Ryan Lamansky (Kardax) says:

    Awesome 🙂

    -Ryan / Kardax

  2. Ivan Stoev says:

    About "Enumerators and Foreach":

    By "superior approach" you probably mean a specific C# compiler recognizable construct. I like it, but in general what you said isn’t a real true statement, right?

  3. ricom says:

    >>By "superior approach" you probably mean a specific C# compiler recognizable construct. I like it, but in general what you said isn’t a real true statement, right?

    I’m not sure what that means but let me be more specific.

    I wasn’t very happy with how ArrayList implemented GetEnumerator. It required a heap allocation because the enumerator returned by GetEnumerator on it’s IEnumerable interface was a reference type (an interface actually).

    In List<T> the enumerator returned by GetEnumerator is a value type. That means it can go on the stack and since the exact type is known the calls can be inlined.

    As a result foreach over generic types is substantially better. Largely eliminating the only reason to avoid foreach in your programs.

    This isn’t a compiler thing. The compiler was always doing the right thing. The problem was that the classes were defined in such a way as to make foreach significantly slower than say a for loop indexing the elements. Adding to the pain is the fact that this could not be changed because it would break lots of otherwise working programs.

    So instead we left ArrayList the way it is and did it better in List<T>.

    Again foreach itself has not been changed. The compiler does what it always did. The problem was in the collection classes.

  4. Ivan Stoev says:

    About "Enumerators and Foreach"

    Best regards Rico, you are great guy and my favorite blogger, but I have to disagree. Actually what you are explaining is exactly what I meant. Only C# compiler analyzes the concrete collection type and the type returned by public GetEnumerator method. Even before Whidbey, C# guy can develop a collection type with a PUBLIC method CALLED GetEnumerator method that returns another type (either class or struct) having PUBLIC method CALLED MoveNext and PUBLIC property CALLED Current returning a CONCRETE type. Collection type does not implement IEnumerable and enumerator type does not implement IEnumerator. Finally, C# will allow using foreach on that collection type, but other languages will not.

    Anyway, same is in Whitbey. All other languages will use IEnumerable<T>.GetEnumerator method which returns IEnumerator<T>, i.e. reference type, so the actual List<T>.Enumerator value type will be boxed.

    That’s why I continue thinking thay this is a C# specific optimization, isn’t it?

    (Btw, we are using C# only, have no backward compatibility, CLS compliance and multi development language requirements, so this is not a problem (actually we use similar approach in our collection types). Just for clarity 🙂

  5. ricom says:

    >>Best regards Rico, you are great guy and my favorite blogger,

    Aww shucks 🙂

    >>but I have to disagree

    Oh no, our first fight, and we were just getting to know each other 🙂 🙂 🙂

    OK seriously the meat of the disagreement is below and I actually think we even agree there we just have a slightly different perspective.

    >>I continue thinking thay this is a C# specific optimization, isn’t it?

    I’m happy to say that from your description the mechanics I can see that we are 100% in agreement on What Is Actually Happening which is the most important part.

    Now as for whether or not its a C# specific optimization here’s how I feel about it:

    We didn’t actually do anything new or different in the C# compiler to take advantage of this. As you illustrate above that wasn’t necessary.

    Other compilers could take advantage of this if they choose to. It’s just a question of how they want to implement their version of foreach.

    Even if other compilers don’t take advantage of this pattern programmers could use it themselves, in other languages, and get the same benefit. With ArrayList the heap allocation was not avoidable if you wanted to use the enumerator.

    But, and I think this is your point, C# is going to be the language that just automatically does the right thing. [Maybe VB does it the same, I’d have to check.]

    So despite the fact that any language *could* do this, I’m not sure any language but C# *will* do this with no help from the programmer.

    But, after all, all I said was "The Generic collections such as List<T> and Dictionary<X,Y> use a superior approach to implementing enumeration than did ArrayList and Hashtable [and this might be helpful]" and I think that 100% true.

  6. Ivan Stoev says:

    >>our first fight

    You are the last MS person I would ever think to fight with 🙂 But.. 🙂 🙂

    >>But, after all, all I said was "The Generic collections such as List<T> and Dictionary<X,Y> use a superior approach to implementing enumeration than did ArrayList and Hashtable [and this might be helpful]" and I think that 100% true.

    Actually all you said were only 2 sentences. And although I was referring to the first one, I was actually argueing against the second 🙂

    Seriously, what really makes me flustrated (which has nothing to your writings) is that Whidbey do not solve problems with hidden heap allocations in general. Moreover, many of the new "cool" features (iterator functions, anonymous delegates, methods like List<T>.ForEach(Action<T> action) etc.) introduce new hidden allocations. For instance, I was wondering since value type enumerator is obviously good improvement, why new enumerator functions are implemented by compler generated reference types.

    Second, this approach is applicable only and only if we program against concrete type. If we put into play even a minimal abstraction, the improvement goes over. But who in practice programs against arrays and List<T>? Many useful generic algorithms could be written (and they are in fact in Peter Golde’s PowerCollections.Algorithms) but they by nature will operate on interfaces (otherwice they will not be generic, right? :)).

    Also, MS design guidelines discourage exposing arrays and List<T> in public APIs. If we are creating public object model for our application (so it will do all enumerations on publicly exposed APIs) and we follow that rule, we should base our collection types on System.Collections.ObjectModel.Collection<T> (or similar, but let’s analyze this). Let’s ignore the fact that when allocating our collection type, we will do actually minimum 3 heap allocations (collection type itself, internally contained List<T> or similar container instance and finally minimum one array instance contained inside internal container) instead of one as we would normally expect. Since there is used small abstraction (the internal container is of type IList<T>), the final enumerator will always be reference type even the real internal container is List<T> as it is by default. Which means that arrays and generic collections improvements make no practical sence in real application development 🙂

    Anyway, I could probably write a tons of similar considerations, but I think I should stop here because I’m sure you know all that things very well.

    Thanks for your time and responses! It was a real pleasure for me 🙂

  7. ricom says:

    >>Seriously, what really makes me flustrated (which has nothing to your writings) is that Whidbey do not solve problems with hidden heap allocations in general. Moreover, many of the new "cool" features (iterator functions, anonymous delegates, methods like List<T>.ForEach(Action<T> action) etc.) introduce new hidden allocations. For instance, I was wondering since value type enumerator is obviously good improvement, why new enumerator functions are implemented by compler generated reference types.

    I’m sure you can guess how I feel about any kind of hidden overheads. But one battle at a time 🙂

  8. Daniel Moth says:

    Blog link of the week 30

  9. Jeffrey Sax says:

    Lots of great sounding stuff. Any improvements on the ‘inlining methods with struct parameters’ issue? (As far as I know, this is the top performance issue on LadyBug.)

  10. ricom says:

    >>Any improvements on the ‘inlining methods with struct parameters’ issue?

    Sadly no, at least none that I’m aware of. I think I’ve been very clear with the JIT team that inlining issues are #1 on my performance hit parade at this point. But alas, only 24 hours in a day.

  11. A while back, someone asked me&amp;nbsp;whether Windows Presentation&amp;nbsp;Foundation calls any private APIs…

  12. raptor says:

    This sounds great, but what about other features like:

    -Reduced overhead for delegate functions call?

    -Inlining of function receiving Value type parameters. I learned that the CLR 1.1 only inlines function with Reference Type parameters.

    -Interface to implemente mathematic operators and operations on Generic parameters.

  13. Rüdiger Klaehn says:

    I second the comments done by raptor.

    The inlining of functions using value type parameters is really essential, especially for graphics and scientific programming. E.g. A method taking a Point or PointF as a parameter will never be inlined, even if it is really trivial. And subsequent optimizations can not happen. So this single issue pretty much destroys all chances to use .NET for serious vector graphics or even scientific computing.

    From the response to my suggestion at http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=fb7b3c93-a9e9-418b-85b3-b67a195c7e1a I think that this will not make it into Whidbey. But this needs to be adressed ASAP.

  14. ricom says:

    >>Reduced overhead for delegate functions call?

    Definately improvements there.

    >>Inlining of function receiving Value type parameters. I learned that the CLR 1.1 only inlines function with Reference Type parameters.

    I beleive we made some improvements that help with wrapper types (i.e. there is only one field) but what people really want is to be able to inline things like a Point and we’re not there yet.

    >>Interface to implement mathematic operators and operations on Generic parameters.

    Perhaps you could use http://lab.msdn.microsoft.com/ to enter a specific request on that?

    I always wish we could have done more… it’s never enough 🙂

  15. Rüdiger Klaehn says:

    >Interface to implement mathematic operators and operations on Generic parameters.

    I think raptor means something like this suggestion:

    http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=3048848f-d1e1-42ec-aa47-5b8e2ab544e7

    However, there are some ways to get around this, like using a seperate generic parameter for the calculations, such as in this article:

    http://www.codeproject.com/csharp/genericnumerics.asp

    But I don’t want to sound negative. On the whole you did a great job with whidbey. It took way longer than I had hoped for, but at least .NET will not be haunted by a broken generics implementation such as the java one.

    regards,

    Rüdiger

  16. Attended the lunchtime presentation by Rico Mariani on writing faster managed code.

    The cool thing I…