Narrowing Down Performance Problems in Managed Code

My last entry was some generic advice about how to do a good performance investigation.  I think actually it’s too generic to be really useful — in fact I think it fails my Peanut Butter Sandwich Test.

Digression to discuss the Peanut Butter Sandwich Test

I review a lot of documents and sometimes they say things that are so obvious as to be uninteresting.  The little quip I have for this situation is, “Yes what you are saying is true of [the system] but it’s also true of peanut butter sandwiches.” Consider a snippet like this one, “Use a cache where it provides benefits,” and compare with, “Use a peanut butter sandwich where it provides benefits.”  Both seem to work… that’s a bad sign. 

You certainly don’t want to get an F on the Peanut Butter Sandwich Test but hopefully you won’t settle for just a C-.

Back on topic

I thought it would be good to follow up the generic advice with some specific suggestions for things to look at. These are things I look at in step 2 or 3 of the investigation.

Under .NET CLR Memory, check “% Time in GC” if it’s getting near 10% or higher you may have some memory issues, consider these secondary tests:

  • is the raw allocation rate “Allocated Bytes/sec” too high? -> reduce total allocations
  • is the promotion rate “Promoted Memory from Gen 1” too high? -> be careful about object lifetimes, avoid “mid-life crisis”
  • is the finalization rate “Finalization Survivors” too high? -> make sure you are disposing the key objects
  • is the heap growing when it shouldn’t “# Bytes in all Heaps” -> check for reference leaks

Is the CPU not saturated when it should be?  Look under .NET CLR LocksAndThreads

  • is the “Contention Rate / sec” counter high compared to your throughput rate? -> you should re-examine your locking strategy
  • is the “# of current physical Threads” too low for the problem?  -> (ammended) more parallelism may be helpful, consider using the ThreadPool if not already in use, possibly adjust ThreadPool parameters to get more threads (not usually needed)
  • in the “Thread” category examine “Context Switches / sec”, is this high compared to your throughput rate?  -> perhaps the workitem you are giving threads in the thread pool is too small, consider something chunkier

Is the throughput rate low even though the CPU is saturated?

  • look under “.NET CLR Exceptions”, is “# of Excepts Thrown / sec” high compared to your throughput? -> consider reducing use of exceptions in common paths
  • look under “.NET CLR Interop”, is “# of marshalling” growing too fast?  -> consider simplifying the arguments passed in interop cases so that marshalling is cheaper
  • look under “.NET CLR Security”, is “% Time in RT checks” significant?  -> consider simplying the demands being placed on the security system to lower the cost of security checks
  • look under “.NET CLR Jit”, is “% Time in Jit” significant? This counter shouldn’t stay high because jitting should settle out, if it remains high then perhaps there is dynamic code generation via reflection going on -> simply dynamic code cases

This just a taste of course, and each of these items would likely lead to further investigation with a profiling tool that is suitable to drilling into that particular kind of problem but these are examples of leading indicators that I use.

For more information on the GC Performance counters specifically see Maoni’s blog entry on that subject.  Her most recent article is on using the GC efficiently also very interesting, lots of good details there.

Comments (16)

  1. Chris Sells says:

    That set of guidance sounds just prescriptive enough to be encodable, Rico. Do I small a PerfCop to watch running .NET apps in the future?

  2. ricom says:

    They’ve been threatening to build "Rico in a box" for years but I’m not aware of any successes yet 🙂

    Seriously though, I’ve been floating "perf cop" notions for about as long as I’ve been on the CLR. Hopefully those ideas will find their way into VSTS at some point.

    But then I’ll have to learn new tricks… maybe this isn’t such a good plan 🙂

  3. Marcus Stade says:

    Awesome post, thanks a bundle!

  4. Thank you for the informative post. I think managed code performance hasn’t been discussed enough as far as monitoring is concerned so thank you for your contribution.

    I have a hard time to make use of some of your guidelines, mainly because I am not aware of the ‘normal’. For example:

    "is the "Contention Rate / sec" counter high compared to your throughput rate"

    "look under ".NET CLR Interop", is "# of marshalling" growing too fast?"

    How can I tell unless I know the ‘normal’ for the type of application I am investigating?

    I think that these guidelines would be a lot more useful if you provided some figures to go with them. Where appropriate you could break down the figures to types of applications, for example:

    Normal for context switches per second

    Using multi-threading: 10000

    Without using multi-threading: 100

    Using multi-threading and Windows Forms: 20000


    I know that what I am asking is a lot, but if it is a lot for people with the amount of experience in performance as you, imagine how confusing it is for people like me who are looking at performance seriously for the first time.

  5. Excellent addition to your advice! A couple quick "most likely" checks for CLRProfiler users would also be useful.

  6. ricom says:

    It’s hard to give "normal" values for many of these counters. So what I like to do is this: compare those counters to a counter that represents "actual work" — like requests per second on a server, or lines of text processed in a parser, or something like that — then divide them so that you can see the cost per item. At that point, with your knowledge of how it’s *supposed* to work you can decide "Is that high for the work being done?"

    Also remember the default scale (in perfmon) on these items already gives you an idea what’s a lot and what’s not. But a lot for one application might be fine for another.

  7. Robin Maffeo says:

    With respect to "is the "# of current physical Threads" too low for the problem?" — in general the CLR threadpool knobs should be left alone unless you really know what you’re doing, as the pool does a reasonable job of injecting threads when necessary.

    I added the SetMaxThreads and SetMinThreads APIs for special cases where applications have a specific need to tweak the behavior of the pool. As a general rule, I advise against it, and have seen very few applications where tweaking them is necessary.


  8. ricom says:

    Robin’s right on. I hardly ever have to touch those settings even though I often check the level of parallelism — which is why it made my list in the first place.

    Actually the most common reason I find for not enough parallelism is that the Thread Pool was *not* used when it could have been. So I think I’d like to ammend my advice to be: if paralleism seems low -> consider using our thread pool if it’s not already in use

  9. Adam Weigert says:

    So, that info is very handy. I have a question though.

    Where do you learn, what do you do / read, that you are able to put all these together?

    I understand in a minor fashion how I learn what I learn, by just reading things I find, and then sometimes original thoughts come about. What have you been learning and from where that would enable someone like myself to become a perf junkie / guru like yourself? 🙂

  10. Visual Studio Team System

    Bill Sheldon from InterKnowlogy has an item in the June 3rd edition of…

  11. Dmitry Sazonov says:

    Thanks for great blog and this post. I seem it is most interesting blog about .Net performance on the net.

    I have small comment and question 🙂

    Very often I feel like all performance advice been given for ASP-like application. Like "use thread pool". It is great idea to use thread pool if you are web server and have multiple clients. But I found out it is faster to have own task queue and own work thread(s) than use tread pool in my case: tons of small computation tasks.

    I have to learn hard way how to make .NET to perform faster.

    For example, I found out what I should avoid to use interfaces, because thay have double virtual calls inside and never inlined. Or, I should not use Math.Min or Math.Max for double – it is 3-4 times slower than write own Min/Max function. Surprise!

    Or, reading PerformanceCounter.NextValue() is aparently very slow 🙁

    There is no information about details like this, which is crucial for me, but I found hundreds advices about using thread pool 🙂

    What should I do if my application: have no contention, a little context switches, no exceptions, almost no security checks, no Jit, no Gen 2 or 1 collections and GC usage less than 1%?

    But I still have CPU maxed out…

  12. ricom says:

    >>What should I do if my application: have no contention, a little context switches, no exceptions, almost no security checks, no Jit, no Gen 2 or 1 collections and GC usage less than 1%? But I still have CPU maxed out…

    Hmmm… Sounds like a good article 🙂