ASP.NET Case Study: High CPU in GC – Large objects and high allocation rates

Rather than spending a lot of time on explaining the details of the garbage collector, I'll refer you to Maoni's blog for some very interesting reading, but in
this case study I want to show you how to get to the bottom of a problem with high CPU in the garbage collector.

Problem description:

At times the application will be very slow to respond and the CPU usage is very high (70-80%).

Troubleshooting steps:

I have already tipped you off in the title that the issue will be High CPU in GC, but how do we even get to that conclusion?  The answer is performance monitor...

A high CPU issue is generally one of 3 things

·         An infinite loop

·         Too much load (i.e. too many requests doing lots of small things, so no one single thing is a culprit)

·         Too much churning in the Garbage Collector.

The first one is pretty easy to spot in a memory dump if you get it at the right time, and is usually caused by some kind of logic problem. The second type is really tough and often requires adding hardware and new machines along with some serious profiling to optimize the code that is running on each request.  For the purposes of this post, I'll skip both of those.

To find out if your performance issue might be GC related, your friend is the .NET CLR Memory performance counters

Let's first take a look at one of the most important performance counters used for these types of issues, .net CLR Memory / % Time in GC.

Paraphrasing from Maoni's blog and my favorite white paper on ASP.NET performance counters, the % Time in GC counter shows how much of the CPU time spent since the last GC was spent on the current GC.  In other words, if we have spent 1 million cycles since the last GC, and we spent 0.3 million cycles in the current GC, this number will be 30%.  

That means for short periods of times you can see huge spikes, but it is the average that is important for this counter.

In different articles people will declare the optimal average value to be 5% or 30% or 20%, so what is it?  Short answer: there is really no optimal value, or rather, the optimal value would be close to 0, but reducing this value is very hard, so I wouldn't even bother if you don't have an average value above 30% or so. There is simply too much pain and too little gain.

The common cause for most high CPU in GC issues is a very high allocation rate (.net CLR Memory / allocated bytes/sec) but really, if you spend all your GC time doing Gen 0 collections that is not going to cause a high CPU in GC since Gen 0 collections are very inexpensive. So basically what hurts you is a lot of Gen 2 collections, i.e. a lot of .net objects making it into Generation 2 and then getting released. The other big thing that causes a high number of Gen 2 collections is excessive use of the Large Object Heap (LOH).  The reason for this is because every time we fill up a LOH segment and need to create a new one to satisfy a request we need to garbage collect, and a large object heap collection triggers a Gen 2 collection, which in turn triggers a Gen 1 and a Gen 0 collection.

This brings us to the third group of counters that I usually look at (.net CLR Memory / # Gen 0 Collections, # Gen 1 Collections and # Gen 2 Collections).  Again, there is no hard and fast rule about what the rates should be between these. Some recommend a 100/10/1 ratio, i.e. for each Gen 2 garbage collection you should have about 10 Gen 1 collections and 100 Gen 0 collections if you have a good allocation pattern.  I think that is a good rule of thumb but deviances from this are common and a fairly small deviance wouldn't concern me too much, however if have the same value for Gen 0, Gen 1 and Gen 2 collections you can start worrying a little bit:)

Oh, while I remember, another thing (apart from the blocked Finalizer that I posted about earlier) that can cause a large number of Gen 2 collections is of course the human element, in other words if you call GC.Collect(3) or GC.Collect(2) manually in your application on a regular and frequent bases, or if you call GC.GetTotalMemory(true).  To see if this can be an issue, watch the (.net CLR Memory / # Induced GC). Ideally this one should be 0, and it is in most cases, but if you are closing in on the memory limit you might see this one go up since ASP.NET itself will induce a GC at that point, but only 1 or so...


For this specific issue I have the following values:

% Time in GC                        ~40 % average
allocated bytes / sec              400 MB average
# Induced GC                        0
# Gen 0 Collections                28.379
# Gen 1 Collections                28.378
# Gen 2 Collections                28.378

WOW! Now that is a massive amount of allocated bytes / sec 🙂 but I definitely know that I am not allocating anything that huge in my application, especially not every second.

The other interesting point here is that our Gen 0, Gen 1 and Gen 2 are close to identical, which is a sure clue that we are either stressing the LOH pretty hard or we have a lot of objects that make it into the 2nd generation and then die immediately.  My personal vote goes to the LOH.

In a perfect world we know exactly what page or small set of pages are responsible for the high allocation rate and we can use the CLR profiler ( to catalogue the allocations and figure out a way to reduce them. Oh, and did I mention, of course in a perfect world we can repro the problem on a test environment.   But… need I say it? We don’t live in a perfect world…

However if you can repro the issue in test, I would highly recommend using the CLR profiler to look at your allocation pattern.

In this particular case, we have no clue what is causing the problem so we will resort to my favourite tool windbg

Debugging the problem:

GC problems are pretty hard to debug for a couple of reasons.

1. If you catch a -hang dump and it happens to be in the middle of a GC, that is no absolute indication that you have a high CPU in GC problem.  In a process with fairly high load and a lot of allocations, GC's happen frequently so it is common to catch it in a GC.
2. Even if you know you have a GC problem (based on performance counters), and you catch the process in a GC, it is hard to tell what actually caused it.

The best way to catch the offender with the hands in the cookie jar, is to take multiple dumps on different occasions when the CPU is high and if they all look pretty much the same, there is a good possibility that you found your CPU thief.

Step 1 - Determining if we are in a GC:

As I've mentioned before we have two different versions of the GC (the server version and the workstation version).  In the server version we have a dedicated GC thread per processor (or 2 if it is hyper threaded), in the workstation version we GC on the same thread that initiates the GC.  The dump output I am going to show you is from a dual-core machine with .net 2.0.

If we are not in a GC, we will have 2 threads (one per core) looking something like this

  18  Id: 134c.918 Suspend: 1 Teb: 7ffa8000 Unfrozen
ChildEBP RetAddr  Args to Child             
01e1fe68 7c822124 77e6baa8 00000398 00000000 ntdll!KiFastSystemCallRet
01e1fe6c 77e6baa8 00000398 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
01e1fedc 79e77fd1 00000398 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
01e1ff20 79e77f9a 00000398 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
01e1ff70 79e77f50 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
01e1ff80 79f3549b ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
01e1ffa8 79f6ece3 00000000 b28c067c 01e1ffec mscorwks!SVR::gc_heap::gc_thread_function+0x2e
01e1ffb8 77e66063 000e7660 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9b
01e1ffec 00000000 79f6ec79 000e7660 00000000 kernel32!BaseThreadStart+0x34

So this is a GC thread waiting for work...

In our dumps from the problem our GC threads look like this…

  16  Id: f28.1150 Suspend: 1 Teb: fff82000 Unfrozen
ChildEBP RetAddr  Args to Child             
0248fd28 7d4d8c46 000002dc 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
0248fd98 79e77fd1 000002dc ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0248fddc 79e77f9a 000002dc ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
0248fe2c 79e77f50 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
0248fe3c 79f35e1a ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
0248fe54 7a0d5a3b 001afbf0 0000000d 00000002 mscorwks!SVR::t_join::join+0x61
0248ff54 79f391bf 00000002 001afbf0 00000000 mscorwks!SVR::gc_heap::plan_phase+0xd78
0248ff70 79f39954 00000002 ffffffff 001afbf0 mscorwks!SVR::gc_heap::gc1+0x57
0248ff88 79f35422 00000000 00000000 001afbf0 mscorwks!SVR::gc_heap::garbage_collect+0x37c
0248ffa8 79f6ece3 00000000 914b1904 0248ffec mscorwks!SVR::gc_heap::gc_thread_function+0x68
0248ffb8 7d4e0729 001afbf0 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9b
0248ffec 00000000 79f6ec79 001afbf0 00000000 kernel32!BaseThreadStart+0x34

…which indicates that we are in the middle of a garbage collection.

So what triggered this particular garbage collection? To find that out we have to look for a thread that has GarbageCollectGeneration on the stack...

  21  Id: f28.1244 Suspend: 1 Teb: fff0f000 Unfrozen
ChildEBP RetAddr  Args to Child             
029eeeb4 7d4d8c46 000002f0 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
029eef24 79e77fd1 000002f0 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
029eef68 79e77f9a 000002f0 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
029eefb8 79e77f50 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
029eefc8 79f3a3f3 ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
029eefec 79f3a2f6 ffffffff 001afcf8 029ef0e4 mscorwks!SVR::gc_heap::wait_for_gc_done+0x99
029ef010 79f3a0c7 00000000 00000000 0004e210 mscorwks!SVR::GCHeap::GarbageCollectGeneration+0x267
029ef0a0 79f2d0aa 029ef0e4 0004e210 00000003 mscorwks!SVR::gc_heap::try_allocate_more_space+0x1bc
029ef0c0 79f32b14 029ef0e4 0004e210 00000003 mscorwks!SVR::gc_heap::allocate_more_space+0x2f
029ef100 7a0d5c9a 0004e20c 00000000 00000000 mscorwks!SVR::gc_heap::allocate_large_object+0x8c
029ef11c 79e74c1c 00211be8 0004e20c 00000000 mscorwks!SVR::GCHeap::Alloc+0xbc
029ef138 79e7e940 0004e20c 00000000 00000000 mscorwks!Alloc+0x60
029ef18c 79e7e9a0 791240f0 00013880 00000000 mscorwks!FastAllocatePrimitiveArray+0xbd
029ef24c 02920df3 00009c40 00002247 d768bc75 mscorwks!JIT_NewArr1+0x148
WARNING: Frame IP not in any known module. Following frames may be wrong.
029ef2a8 6881bdc6 02a60824 06a77b84 0aa3a218 0x2920df3
029ef2c0 6881bfbc 06a76df0 0aa3a218 06a7cb04 System_Web_ni+0x25bdc6
00000000 00000000 00000000 00000000 00000000 System_Web_ni+0x25bfbc

In SOS for 1.1 the threads output for this thread would have a (GC) in the exception column like this...

0:029> !threads
ThreadCount: 7
UnstartedThread: 0
BackgroundThread: 7
PendingThread: 0
DeadThread: 0
Hosted Runtime: no

                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
  14    1 11b8 001aba50   1808220 Enabled  00000000:00000000 001a8868     0 Ukn (Threadpool Worker)
  18    2 11f8 001b1ef0      b220 Enabled  00000000:00000000 001a8868     0 MTA (Finalizer)
  19    3 1174 001c75e8    80a220 Enabled  00000000:00000000 001a8868     0 MTA (Threadpool Completion Port)
  20    4 133c 001ca250      1220 Enabled  00000000:00000000 001a8868     0 Ukn
  12    6 1178 001dcae8   880a220 Enabled  00000000:00000000 001a8868     0 MTA (Threadpool Completion Port)
  21    7 1244 00211ba8   180b220 Enabled  00000000:00000000 001ca990     1 MTA (GC)
  22    5  6f8 001dc450   880b220 Enabled  00000000:00000000 001a8868     0 MTA (Threadpool Completion Port)

…but that is not always true for sos.dll for 2.0, so then we have to resort to finding the thread that calls garbage collect generation...

Step 2 - Why are we garbage collecting:

In this case we can see that the GC happened because we tried to allocate a large object (allocate_large_object) and we needed more space on the LOH to do this...
now that is a pretty good indication that we hit the spot... particularly since we saw that the number of Gen 2 collections and Gen 1 collections and Gen 0 collections were pretty much the same, indicating that there is a lot of Gen 2, or LOH collections.

The other thing we can see from this stack before moving on to the managed stack is that we are trying to allocate an object of size 0x0004e20c (the first parameter to allocate_large_object) which equates to 320.012 bytes.

We can also see if we run !runaway (which gives us combined user mode CPU time for different threads) that the threads that have used up most CPU since the start of this process are the following:

0:029> !runaway
 User Mode Time
  Thread       Time
  14:11b8      0 days 0:00:55.687
  16:1150      0 days 0:00:45.500
  17:7a8       0 days 0:00:43.875
  21:1244      0 days 0:00:23.140
  0:ea0       0 days 0:00:00.046
  29:fc8       0 days 0:00:00.000
  28:11f0      0 days 0:00:00.000

The total up-time for the process is around 17 minutes so it has been idle for most of its lifetime.

0:021> .time
Debug session time: Tue Jun 20 09:46:04.000 2006 (GMT+2)
System Uptime: 4 days 6:36:51.031
Process Uptime: 0 days 0:16:53.000
  Kernel time: 0 days 0:00:45.000
  User time: 0 days 0:02:48.000

Threads 16 and 17 are our GC threads and they will pretty much always be on top of the list since a lot of the other threads (worker threads) will spend a lot of their time being idle waiting for requests.

Then we have thread 14 and 21 that are worker threads (based on the threads output above), and if we start with thread 14 we can see based on the stack output that this thread is currently idle…

  14  Id: f28.11b8 Suspend: 1 Teb: fff88000 Unfrozen
ChildEBP RetAddr  Args to Child             
020afc8c 7d4d8c46 00000224 00000000 020afcd0 ntdll!ZwWaitForSingleObject+0x15
020afcfc 79e77fd1 00000224 00009c40 00000000 kernel32!WaitForSingleObjectEx+0xac
020afd40 79e77f9a 00000224 00009

Comments (24)
  1. stephen patten says:

    Glad you’re back!

  2. Fernando Tubio says:

    As always, a very enlightening article.

    Just a minor quibble about the "Hello world" string concatenation example you give towards the end. In that case the compiler takes care of the concatenation of the string literals and only a single call to String.Concat( string, string ) is actually performed. Even if the concatenation involved variables, as in str += s1 + s2 + s3 + s4, the compiler first builds an array to hold the individual variables and then calls String.Concat( string [] ).

  3. AlexP says:

    Hi, Tess.

    It’s great article. Very interesting for me. Look forward for another new 🙂

  4. James Bolles says:

    Thanks for this post, Tess.  I am in a similar situation where I have large amount of memory being allocated during web service processing.  Have you ever experienced cases where a multi-threaded application running on a server with multiple CPUs continues to allocate memory even when profilers are stating the objects have been collected?  We have some value types in our logic that are referenced by many threads and it seems that some of those value types are still in memory even after they should be deleted.  

    Keep posting your findings.  Your blog has helped me to learn how powerful and useful WinDbg can be for production support.

  5. snaveen says:


              Cool post. Another way to identify the cause of the objects in the LOH is to have a “bp on mscorwks!gc_heap::allocate_large_object”  something like this bp 791c4fcd "!clrstack;.echo *********Allocation of large object heap***********;g". This should help identifying the objects that are getting allocated in the LOH.


  6. Tess says:

    Thanks for the note on the Hello world sample Fernando, that was news to me, but good news nevertheless:)

  7. Tess says:

    HI James,

    I am not sure exactly why this is happening in your case and how the particular profilers determine that it is deleted.  I would look at if it is actually stating that it is deleted or just un-rooted, in which case it might not really be garbage collected yet?  Also perhaps it is that you are pinning objects and leaving a lot of free space.  

  8. Tess says:

    Cool comment Naveen,

    that is definitely worth running in a test environment if you are looking at high usage of the LOH.

  9. Howard says:

    I’ve got an application which uses 2 app pools on the same machine, so I’ve got 2 w3wp processes. Ok, nothing wrong with that except then when I try to look at the .NET counters as per Tess’s excellent (again) article they show up as w3wp and w3wp#1.

    Does anybody know a way to correlate the PID, or App Pool or anything to the instance name shown in Perf Mon so I can tell which ones which?


  10. Tess says:

    If you look at the process counter it has an id correspoding to each process that you can correlate the other counters with.

  11. I have talked about a number of different hang/performance issues in my posts. This post is a generic

  12. I have talked about a number of different hang/performance issues in my posts. This post is a generic

  13. I get enough issues relating to bad perf caused by large viewstate that I felt like it is time to dedicate

  14. philippe says:

    Excellent article!

    We are also in a similar situation. Our code utlise some wrappers to call some native C code and if we build the wrappers in C# ie in manaed code then the server hangs within a few minutes as they are too many alloc for the GC to keep up. If we build the wrappers using __pin and Marshall then the web servers are ok but %Time in GC over around 80% with a CPUI load at 75% in .NET2.0.

    The exact same app in .NET1.1 does not experience any of this! Any clues here?

    Your help would be greatly appreciated.

  15. Chris says:

    Ah, to have your understanding…  Currently our production app is experiencing almost exactly this issue.  Our GC 0,1, and 2 are all in lock step after all allocation seems to be done.  We’re allocating a bunch (emphasis on bunch) of small things (around 400,000 a second) at different memory sizes but jumping to 30MB/sec at one point.  Having the most impossible time breaking down what you have here to steps we can take because you’re so much smarter than me and assume I know things I don’t.  I hope your enjoying your ambrosia.

  16. Tess says:

    Hi Chris,

    It’s a bit hard to tell just from that description what is going on, but it does sound like one of 2 things if GC 0,1 and 2 increase by pretty much exactly the same numbers.

    1. you do have some large objects that you are perhaps not aware of

    2. someone is calling GC collect

    I would suggest that you check out the large object heap size in perfmon (under .net clr memory) and see if it goes up, if it does, look at the dump and run !dumpheap -min 85000,  perhaps all these small objects are stored in a large object array or similar that ends up on the LOH.

    The other option with the GC.Collect you can verify or disspell by checking out the number of induced gc’s (also under .net clr memory).  

    Having said this, GC perf issues like this are super tricky to figure out.  I am working on one right now that I was planning on blogging on later (things have just been terribly busy lately), but my issue is caused by something really rare that i dont think applies to your situation.  It is caused by a library used in the app calling SetProcessorAffinityMask limiting the process to run on one process:) and thus causing all the gc threads to spin on that one process in turn causing major cpu usage…  a very ugly story…

  17. I thought I’d share a support story with you from a very interesting case I have. My customer is running

  18. roy ashbrook says:

    Ah. Garbage Collection… how I love and hate thee. =P I think one sad thing about programming in .net

  19. roy ashbrook says:

    These are the articles (in no particular order) that I felt best showed a thorough use of the WinDbg

  20. Webmaster says:

    Thanks, this really helped me find one of the problems leading to high cpu usage on our server!

Comments are closed.

Skip to main content