GC Performance Counters


There are many .NET Memory Performance Counters and this is meant to give you some guidelines in interpreting the counter data and how to correlate them. This assumes you have a basic understanding of GC.

 

First thing you may want to look at is “% Time in GC”. This is the percentage of the time spent in GC since the end of the last GC. For example, it’s been 1 million cycles since last GC ended and we spent 0.3 million cycles in the current GC, this counter will show 30%.

 

What is a health value for this counter? It’s hard to say. It depends on what your app does. But if you are seeing a really high value (like 50% or more) then it’s a reasonable time to look at what’s going on inside of the managed heap. If this number is 10%, it’s probably better to look elsewhere in your app because even if you could get rid of half of that, you would only be saving 5% – most likely not very worthwhile.

 

If you think you are spending too much time in GC, it’s a good time to look at “Allocated Bytes/sec” which shows you the allocation rate. This counter is not exactly accurate when the allocation rate is too low – meaning that if the sampling frequency is higher than the GC frequency as the counter is only updated at the beginning of each GC.

 

When each GC begins, there could be 2 cases:

 

1)      Gen0 is basically full (meaning that it wasn’t large enough to satisfy the last small object allocation request);

2)      The LOH (Large Object Heap) is basically full (meaning that it wasn’t large enough to satisfy the last large object allocation request); 

 

When the allocation request can’t be satisfied, it triggers a GC. So when this GC begins we update the value this counter uses by adding the number of allocated bytes in gen0 and LOH to it, and since it’s a rate counter the actual number you see is the difference between the last 2 values divided by the time interval.

Let’s say you are sampling every second (which is the default in PerfMon) and during the 1st second a Gen0 GC occured which allocated 250k. So at the end of the 1st second PerfMon samples the counter value which will be (250k-0k) / 1 second = 250k/sec. Then no GCs happened during the 2nd and the 3rd second so we don’t change the value we recorded which will still be 250k, now you get (250k-250k) / 1 second = 0k/sec. Then let’s say a Gen0 GC happened during the 4th second and we recorded that we allocated 505k total, so at the end of the 4th second, PerfMon will show you (505k-250k) / 1 second = 255k/sec.

This means when GC doesn’t happen very frequently, you get these 0k/sec counter values. But if say there’s always at least one GC happening during each second, you will see an accurate value for the allocation rate (when you are sampling every second, that is).

We also have a counter that’s just for large objects – “Large Object Heap size”. This is updated at the same time “Allocated Bytes/sec” is updated and it just counts the bytes allocated in LOH. If you suspect that you are allocating a lot of large objects (in Whidbey this means 85000 bytes or more. But the runtime itself uses the LOH so if you see the LOH size less than 85000 bytes that’s due to the runtime allocation [Editted on 12/31/2004]), you can look at this counter along with “Allocated Bytes/sec” to verify it.

 

Allocating at a high rate definitely is a key factor that causes GC to do a lot of work. But if your objects usually die young, ie, mostly Gen0 GCs, you shouldn’t observe a high percentage of time spend in GC. Ideally if they all die in Gen0 then you could be doing a lot of Gen0 GCs but not much time will be spent in GC as Gen0 GCs take little time.

 

Gen2 GC requires a full collection (Gen0, Gen1, Gen2 and LOH! Large objects are GC’ed at every Gen2 GC even when the GC was not triggered by lack of space in LOH. Note that there isn’t a GC that only collects large objects.) which takes much longer than younger generation collections. A healthy ratio is for every 10 Gen0 GC we do a Gen1 GC; and for every 10 Gen1 GC we do a Gen2 GC. If you are seeing a lot of time spent in GC it could be that you are doing Gen2 GC’s too often. So look at collection counters:

 

“# Gen 0 Collections”

“# Gen 1 Collections”

“# Gen 2 Collections”

 

They show the number of collections for the respective generation since the process started. Note that a Gen1 collection collects both Gen0 and Gen1 in one pass – we don’t do a Gen0 GC, and then determine that no space is available in Gen0, then we go do a Gen1 GC. Which generation to collect is decided at the beginning of each GC.

 

If you are seeing a lot of Gen2 GCs, it means you have many objects that live for too long but not long enough for them to always stay in Gen2. When you are spending a lot of time in GC but the allocation rate is not very high, it might very well be the case that many of the objects you allocated survived garbage collection, ie, they get promoted to the next generation. Looking at the promotion counters should give you some idea, especially the Gen1 counter:

 

“Promoted Memory from Gen 0”

“Promoted Memory from Gen 1”

 

Note that these values don’t include the objects promoted due to finalization for which we have this counter:

 

“Promoted Finalization-Memory from Gen 0”

 

It gets updated at the end of each GC. Note that in Everett there’s also the “Promoted Finalization-Memory from Gen 1”counter which was removed in Whidbey. The reason was it was not useful. The “Promoted Finalization-Memory from Gen 0” counter already has the memory promoted from both Gen 0 and Gen 1. We should really rename it to just “Promoted Finalization-Memory”.

 

One of the worst situations is when you have objects that live long enough to be promoted to Gen2 then die quickly in Gen2 (the “midlife crisis” scenario). When you have this situation you will see high values for “Promoted Memory from Gen 1” and lots of Gen2 GCs. You can then use the CLRProfiler to find out which objects they are.

 

Note that when a finalizable object survives all the objects it refers to also survive. And the values for “Promoted Finalization-Memory from Gen X” include these objects too. Using the CLRProfiler is also a convenient way to find out which objects got finalized.

 

When you observe high values for these promotion counters, you will most likely observe high values for the Gen1 and Gen2 heap sizes as well. These sizes are indicated by these counters:

 

“Gen 1 heap size”

“Gen 2 heap size”

 

We also have a counter for Gen 0 heap size but it means the budget we allocated for the next GC (ie, the number of bytes that would trigger the next GC) and not the actual Gen 0 size which is either 0 or a small number if there’s pinning in Gen 0.

 

The heap size counters are updated at the end of each GC and indicate values for that GC. Gen0 and Gen1 heap sizes are usually fairly small (from ~256k to a few MBs) but Gen2 can get arbitrarily big.

 

If you want to get an idea of how much memory allocated total on the GC heap, you can look at these 2 counters:

 

“# Total committed Bytes”

“# Total reserved Bytes”

 

They are updated at the end of each GC and indicate the total committed/reserved memory on the GC heap at that time. The value of the total committed bytes is a bit bigger than

 

Gen 0 heap size” + “Gen 1 heap size” + “Gen 2 heap size” + “Large Object Heap Size”

 

Note that this “Gen 0 heap size” is the actual Gen0 size – not the Gen0 budget as shown in the perfmon counter. [Editted on 12/31/2004]

 

When we allocate a new heap segment, the memory is reserved for that segment and will only be committed when needed at an appropriate granularity. So the total reserved bytes could be quite a bit bigger than the total committed bytes.

 

If you see a high value for “# Induced GC” it’s usually a bad sign – someone is calling GC.Collect too often. It’s often a bug in the app when this happens.

 

[Editted on 12/31/2004]

Comments (50)

  1. Paul Wilson says:

    Great information. Now if only you guys can get all the "total/global" bytes counters to actually be for all .net processes as documented, instead of just for the very last particular sample of one of the .net processes!

  2. Brad Abrams says:
  3. Matt says:

    Might be an idea to talk a little about PerfMon counters and Multi-CPU machines – in particular .NET V1.1 and %Time in GC…

  4. Thanks for this. I was a bit concerned about high allocation rate in one of my apps, but they’re all Gen0, so I’m getting from this that it shouldn’t be a problem. Good stuff!

  5. Maoni says:

    Paul, we are hoping to address that in Whidbey.

    Matt, if you meant the problem where occasionally you see a really large value (negative value) for the %Time in GC in V1.1, that is fixed in Whidbey. If that’s not what you meant, let me know.

  6. Scott Sorensehn says:

    Thanks for the good info on GC Perf Counters. I have several different servers that are running memory intensive applications. These servers are dedicated to the one .Net app running on the server. In some cases it is an ASP.Net application and in other cases it is a .Net application running as a Windows Service. In both cases I have installed 2 gig of RAM on the machines but there seems to be a limit to the amount of RAM that will be used by the CLR because the app is never able to use more than about 600 Meg of RAM before it begins using virutal memory. How can I set the w3wp.exe or CLR in the case of the service to use more of the servers RAM? Also is there a limit to the amount of RAM that can be used by a single ASP.Net application and is it configurable in the case where it is the only ASP.Net app running on the box?

  7. Maoni says:

    Scott, how did you get the 600 MB number? ASP.net does have a setting for memory – memoryLimit. Search for memoryLimit and asp.net for an explanation of this setting.

  8. Mark Tucker says:

    Is it possible to never call GC.Collect() in your code and still get values for "# Incuded GC"? Does code in the .NET Framework 1.0 or 1.1 make calls that update this counter? What are all the places that would be called?

  9. Maoni says:

    Mark, yes it is possible to get values for "# Induced GC" even when you are not calling GC.Collect yourself. For example, if you call GC.GetTotalMemory with true. In Whidbey we also added more places in the BCL that call GC.Collect. But usually this is very very rare – we recommand to never call GC.Collect in apps in general, and at very limited places in the BCL.

  10. I previously blogged about a set must-read garbage collection articles and issues around directly…

  11. Pierre Schexneider says:

    When trying to create a perfmon counter wew get the following error.

    nvalidOperationException: Cannot read Instance :MonitorUseResou

    namespace Apress.ExpertDotNet.MonitorUseResources

    {

    //This is the beginning

    //below is the code where the problem starts

    InitializeComponent();

    this.pcGen0Collections = new PerformanceCounter(".NET CLR Memory",

    "# Gen 0 Collections", "MonitorUseResou");

    this.pcGen0Collections.BeginInit();

    this.pcGen1Collections = new PerformanceCounter(".NET CLR Memory",

    "# Gen 1 Collections", "MonitorUseResou");

    this.pcGen1Collections.BeginInit();

    //Here is what we found, when

    //we rename "MonitorUseResou" "MonitorUseReso"  

    //with exactly 14 characters it works. Can you

    //explain this?

    //Thank You

  12. I was writing an internal wiki page on performance and thought this info is useful to many external readers

  13. Managed Heap Size We have both .NET CLR Memory perf counters and SoS extensions that report manged heap

  14. You’ve been kicked (a good thing) – Trackback from DotNetKicks.com

  15. Rajesh Patel says:

    What performance counter should be used to debug the OutOfMemoryException?

  16. Ashok Srinivasan says:

    The multi-threaded server app that we ported from VJ++ to J# is showing a high "# induced GC". We are not calling GC directly from our code. What is the best way to find who is calling the GC?

    Thanks!

    -Ashok

  17. maoni says:

    Ashok, you can set a breakpoint on the GC.Collect method and see who’s calling it.

  18. Ashok Srinivasan says:

    Maoni,

    Below is the stack trace I obtained from VS’05 Profiler which captures the bottleneck call sequence -that accounts for over 50% of the sampled time!

    Please see my question at the bottom of this post.

    WKS::GCHeap::FinalizerThreadCreate(void)

    ManagedThreadBase::FinalizerBase(void (*)(void *))

    ManagedThreadBase_NoADTransition(void (*)(void *),…)

    Thread::SetStackLimits(enum Thread::SetStackLimitScope)

    Thread::HasStarted(void)

    Thread::DoADCallBack(struct ADID,void (*)…)

    CustomAttributeParser::UnpackValue(unsigned char…)

    Object::GetAppDomain(void)

    WKS::CreateGCHeap(void)

    HostExecutionContextManager::SetHostRestrictedContext

    WKS::CallFinalizer(class Object *)

    MethodTable::CallFinalizer(class Object *)

    com.ms.vjsharp.lang.ThreadEnd.Finalize()

    FastAllocatePrimitiveArray(class MethodTable *,…)

    GCInterface::CollectGeneration(int)

    WKS::gc_heap::mark_phase(int,int)

    WKS::GCHeap::GarbageCollectGeneration(unsigned int…)

    WKS::gc_heap::garbage_collect(int,int)

    Please note that the last line above represents top of the stack. It appears that ThreadEnd.Finalize is inducing the garbage collection? I am stuck here. What do you think could be going on?

    Thanks!

  19. maoni says:

    Ashok, the calls you showed – they can’t be from the same callstack.

  20. Ashok Srinivasan says:

    Maoni, this is as reported by VS’05 Team Profiler while sampling our server exe over a half hour period. I pulled this out of the call stack view. The execution thread in question is calling Finalize on various objects all of happen very fast EXCEPT for ThreadEnd.Finalize. The profiler clearly shows:

    1. FastAllocatePrimitiveArray to be a descendent of ThreadEnd.Finalize

    2. GCInterface.CollectGeneration to be the descendent of FastAllocatePrimitiveArray().

    I have run the profiler several times now. The results are consistent – always the same.

    What am I missing?

    Is there a way I can send you the .VPS (Profiler report) file?

    Thanks!

  21. maoni says:

    Ashok, I dunno how to intrepret what VS profiler shows ’cause I don’t use it. But I can tell you for a fact that FastAllocatePrimitiveArray doesn’t call GCInterface.CollectGeneration.

    The advice I can give you is to set a bp on the GC.Collect call (you can set it on GCInterface.CollectGeneration if you want…the former calls the latter).

  22. Rather than spending a lot of time on explaining the details of the garbage collector, I’ll refer you

  23. Alex Feinberg says:

    ThreadEnd.Finalize definitely calls GC.Collect()

  24. Problem We have had a few customers run into this issue where they are using the J# ThreadEnd objects

  25. Problem We have had a few customers run into this issue where they are using the J# ThreadEnd objects

  26. Aish says:

    Hi,

    I have been having trouble understanding very large values in % Time in GC? Aren’t the values supposed to be in the reange 0-100%? What would a value like 2018.185129 mean? Any help would be greatly appreciated!

    Thank you!

    Regards,

    Aishwariya

  27. Alik Levin's says:

         To achieve best performance you need to make decisions based on trade-off between

  28. I often hear from X++ developers, that the .Net garbage collector ( GC ) does not work correctly or that

  29. I often hear from X++ developers, that the .Net garbage collector ( GC ) does not work correctly or that

  30. Podlipensky says:

    Разработка веб-сайтов, впрочем как и любых других софтверных строений, всегда стояла на распутьи трех

  31. Are you trying to improve the memory usage of your .net application? I’ve spent some time recently trying

  32. Sev Zaslavsky says:

    I need clarification on the phrase "during the 1st second a Gen0 GC occured which allocated 250k." – You imply that a GC can *Allocate* memory?  Doesn't a GC *free* memory?  If the GC actually freed 250k, then isn't the "Allocated Bytes/second" performance counter actually counting the amount of memory *freed* per second?  Ofcourse the amount of memory freed should be fairly close to the amount of memory allocated (unless there is a serious leak).

    I also have a question:  is it possible to calculate the "Allocated Bytes/second" in the actual process being monitored?  I would have to trap the completion of all GCs and find out the amount of memory freed.

  33. Sarma Pisapati says:

    How do I increase the Heap size > default (1.5 GB)? Is there a config file for .NET framework?

  34. Raja Moparthi says:

    Hi Stephen,

    Very nice and informative article.. In my application, there are 5 Gen2 collections, 1500 Gen1 collections and 2,15,000 Gen0 collection.. All the time %Gc time is more than 60%. Here we have very few Gen2 which is 5 as i mentioned. What couldbe the reason for hight % GC time?