Tracking down managed memory leaks (how to find a GC leak)


If you think you’ve got memory leaks, or if you’re just wondering what kind of stuff is on your heap you can follow the very same steps that I do and get fabulous results your friends will envy.  OK, well maybe not,  but they’re handy anyway. 

These steps will help you to go from a suspected memory leak to the specific object references that are keeping your objects alive.  See the Resources at the end for the location of all these tools.

Step 1:  Run your process and put it in the state you are curious about

Be sure to choose a scenario you can reproduce if that’s at all possible, otherwise you’ll never know if you’re making headway in clearing out the memory leaks.

Step 2:  Use tasklist to find its process ID

C:\>tasklist

Image Name                   PID Session Name     Session#    Mem Usage
========================= ====== ================ ======== ============
System Idle Process            0 RDP-Tcp#9               0         16 K
System                         4 RDP-Tcp#9               0        112 K
smss.exe                     624 RDP-Tcp#9               0        252 K
 …etc…
ShowFormComplex.exe         4496 RDP-Tcp#9               0     20,708 K
tasklist.exe                3636 RDP-Tcp#9               0      4,180 K

From here we can see that my process is ID #4496

Step 3:  Use VADump to get a summary of the process

C:\>vadump -sop 4496
Category                        Total        Private Shareable    Shared
                           Pages    KBytes    KBytes    KBytes    KBytes
      Page Table Pages        35       140       140         0         0
      Other System            15        60        60         0         0
      Code/StaticData       4596     18384      4020      3376     10988
      Heap                   215       860       860         0         0
      Stack                   30       120       120         0         0
      Teb                      4        16        16         0         0
      Mapped Data            129       516         0        24       492
      Other Data             157       628       624         4         0

      Total Modules         4596     18384      4020      3376     10988
      Total Dynamic Data     535      2140      1620        28       492
      Total System            50       200       200         0         0
Grand Total Working Set     5181     20724      5840      3404     11480

Here we can see that the process is mostly code (18384k)

The vast majority of the resources that the CLR uses are under “Other Data” — this is because the GC Heap is directly allocated with VirtualAlloc — it doesn’t go through a regular windows heap.  And same for the so-called “loader heaps” which hold type information and jitted code.  Most of the conventional “Heap” allocations are from whatever unmanaged is running.  In this case it’s a winform application with piles of controls so there’s storage associated with those things.

There isn’t much “Other Data” here so the heap situation is probably pretty good but let’s see where we stand on detailed CLR memory usage.

Step 4:  Attach Windbg and load SOS

C:\> windbg -p 4496

Once the debugger loads use this command to load our extension DLL

0:004> .loadby sos mscorwks

This tells the debugger to load the extension “sos.dll” from the same place that mscorwks.dll was loaded.  That ensures that you get the right version of SOS (it should be the one that matches the mscorwks you are using)

Step 5:  Get the CLR memory summary

This command gives you a summary of what we’ve allocated.  The output will be a little different depending on the version of the runtime you are using.  But for a simple application you get the loader heaps for the two base domain structures (they just hold objects that can be shared in assorted ways) plus the storage for the first real appdomain (Domain 1).  And of course the jitted code.

0:004> !EEHeap
Loader Heap:
————————————–
System Domain: 5e093770
…etc…
Total size: 0x8000(32768)bytes
————————————–
Shared Domain: 5e093fa8
…etc…
Total size: 0xa000(40960)bytes
————————————–
Domain 1: 14f0d0
…etc…
Total size: 0x18000(98304)bytes
————————————–
Jit code heap:
LoaderCodeHeap: 02ef0000(10000:7000) Size: 0x7000(28672)bytes.
Total size: 0x7000(28672)bytes
————————————–
Module Thunk heaps:
…etc…
Total size: 0x0(0)bytes
————————————–
Module Lookup Table heaps:
…etc…
Total size: 0x0(0)bytes
————————————–
Total LoaderHeap size: 0x31000(200704)bytes

So here we’ve got 200k of stuff associated with what has been loaded, about 28k of which is jitted code. 

Next in the output (same command) is the summary of the GC Heap.

=======================================
Number of GC Heaps: 1
generation 0 starts at 0x00a61018
generation 1 starts at 0x00a6100c
generation 2 starts at 0x00a61000
ephemeral segment allocation context: none
 segment    begin allocated     size
001b8630 7a8d0bbc  7a8f08d8 0x0001fd1c(130332)
001b4ac8 7b4f77e0  7b50dcc8 0x000164e8(91368)
00157690 02c10004  02c10010 0x0000000c(12)
00157610 5ba35728  5ba7c4a0 0x00046d78(290168)
00a60000 00a61000  00aac000 0x0004b000(307200)
Large object heap starts at 0x01a61000
 segment    begin allocated     size
01a60000 01a61000  01a66d90 0x00005d90(23952)
Total Size   0xcdd18(843032)
——————————
GC Heap Size   0xcdd18(843032)

You will likely have many fewer small-heap segments than I did because I did this test on an internal debug build so there’s funny things like a 12 byte segment in the dump.  But you’ll see what segements there are and how big they are and you can see what the current boundaries are on the generations from which you can compute thier current exact size.  (Note that this is likely different than what the performance counters report as you can see in Maoni’s blog — those counters are budgeted from the last GC not the instanteous value — it would be too expensive to keep updating the instantaneous value)

So in this case we can see that there is about 843k of GC heap.  Comparing that to the other data category there was about 2M total of other data.  The CLR accounts for about 1M of that.  The rest is likely bitmaps allocated from my winforms application’s controls but whatever it is, it isn’t CLR stuff…

Step 5: Dump the GC Heap statistics

Next we’ll want to know, by type, what’s on the heap at this exact instant

0:004> !DumpHeap -stat
… sorted from smallest to biggest … etc. etc…

7b586c7c      436     10464 System.Internal.Gdi.WindowsGraphics
5ba867ac      208     11648 System.Reflection.RuntimeMethodInfo
7b586898      627     12540 System.Internal.Gdi.DeviceContext
5baa4954      677     39992 System.Object[]
5ba25c9c     8593    561496 System.String
Total 17427 objects

Note that this dump includes both reachable and unreachable objects so unless you know that the GC just ran before you did this command you’ll see some dead stuff in this report as well.  Sometimes its interesting and useful to force a GC to run before you do this so that you can get a summary of just the live stuff.  Sometimes it’s useful to do dumps before and after forcing a GC so that you can see what sort of things are dying.  This may be a way to gather evidence that a forced GC is necessary.  See my blog on When to Call GC.Collect().

So let’s suppose that there weren’t supposed to be 208 System.Reflection.RuntimeMethodInfo objects allocated here and that we thought that was a leak.  One of the things we’ll want to do is to use CLR Profiler to see where those objects are being allocated — that will give us half the picture.  But we can get the other half of the picture right here in the debugger.

Step 6: Dump Type Specific Information

We can dump each object whose type name includes a given string with a simple command

0:004> !DumpHeap -type System.Reflection.RuntimeMethodInfo
 Address       MT     Size
00a63da4 5baa62c0       32    
00a63e04 5baa6174       20    
00a63e2c 5ba867ac       56    
00a63e64 5baa5fa8       16    
00a63e88 5baa5fa8       16    
00a63f24 5baa6174       20    
00a63f4c 5ba867ac       56    
00a63f84 5baa5fa8       16    
etc. etc. etc.
total 630 objects

Statistics:
      MT    Count TotalSize Class Name
5baa62c0        3        96 System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[[System.Reflection.RuntimeMethodInfo, mscorlib]]
5baa5fa8      211      3376 System.Reflection.CerArrayList`1[[System.Reflection.RuntimeMethodInfo, mscorlib]]
5baa6174      208      4160 System.Collections.Generic.List`1[[System.Reflection.RuntimeMethodInfo, mscorlib]]
5ba867ac      208     11648 System.Reflection.RuntimeMethodInfo
Total 630 objects

Note that the type we wanted was System.Reflection.RuntimeMethodInfo and we can see that it has a method table 5ba867ac.  Those are the 56 byte objects. Now we can investigate some of these and see what is causing them to stay alive.

Step 7: Identify the roots of suspected leaks

One of the lines in the dump was

00a63e2c 5ba867ac       56    

So that tells us there is an object of the type we want at address 00a63e2c.  Let’s see what’s keeping it alive

0:004> !gcroot 00a63e2c
Scan Thread 0 OSTHread 1598
Scan Thread 2 OSTHread 103c

DOMAIN(0014F0D0):
HANDLE(WeakLn):3f10f0:
Root:00a63d20(System.RuntimeType+RuntimeTypeCache)
->00a63da4(System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[[System.Reflection.RuntimeMethodInfo,mscorlib]])
->00a63e88(System.Reflection.CerArrayList`1[[System.Reflection.RuntimeMethodInfo, mscorlib]])
->00a63e98(System.Object[])
->00a63e2c(System.Reflection.RuntimeMethodInfo)

DOMAIN(0014F0D0):
HANDLE(Pinned):3f13ec:
Root:01a64b50(System.Object[])
->00a62f20(System.ComponentModel.WeakEventHandlerList)
->00a63fb4(System.ComponentModel.WeakEventHandlerList+ListEntry)
->00a63ec4(System.ComponentModel.WeakEventHandlerList+ListEntry)
->00aa5f6c(System.ComponentModel.WeakDelegateHolder)
->00a63e2c(System.Reflection.RuntimeMethodInfo)

I’ve added some extra line breaks to the output above to make it easier to read but otherwise it’s the raw output.

The gcroot command is trying to tell you if the object is reachable and if so how it is reached from each root.  The dump won’t include all the ways the object is reachable but you do get at least one way to find the object — usually that’s enough.  If multiple paths are dumped they often have a common tail.  However the object is reachable (here it looks like maybe only weak references are left so this guy might go away on the next collect) that should give you a hint about (some of) the remaining references.  From there you can decide what pointers to null so that the object is properly released.

Resources

You can get information on windbg at this location.
Vadump has usage information and a download from msdn and microsoft.com respectively.

If those links break, searching for windbg and vadump on the microsoft.com home page gave good results, that’s how I got those links in the first place. 

CLR Profiler is available here.

It comes with documentation but there is additional material available in the Performance PAG in Chapter 13.

Comments (70)

  1. J. Marsch says:

    Great post, Rico. I would love to know more about doing this type of low-level debugging. My current experience is limited to source code level debuggers.

    Also, I’m a little bit curious about that last dump in your post — the output of the gcroot command.

    Does this imply that there will be events/delegates in .Net 2.0 that use weak refs?

  2. Rico Mariani says:

    You know I’m often surprised by what I see when I dump the heap. This guy System.ComponentModel.WeakEventHandlerList seems like he could be quite interesting and I don’t know thing one about him. It might be generally useful but it might also be a private type with an unfortunate name. Even internal types appear in the low level dumps like this.

    I’ll see if I can’t find out something for my own curiosity if nothing else.

  3. Excellent post Rico. What I would like to see someone work on is dumping this information into a file directly from CLRProfiler. That way you can view the dump and the heap and object allocations knowing they all are generated under the same circumstances.

  4. Pavel Lebedinsky says:

    For 1.1 it’s probably better to use sos.dll that comes with the latest debuggers. It’s loaded automatically when you attach to a process that has CLR dlls loaded, and it has some fixes and new commands/shortcuts that the original Everett version of sos doesn’t.

  5. Rico Mariani says:

    >That way you can view the dump and the heap and object allocations knowing they all are generated under the same circumstances.

    It is often very useful to start the process under CLRProfiler and then also attach with the debugger so you can do both on the same process. This works just fine. In fact it’s extra handy because CLR Profiler’s "Dump Heap Now" forces a garbage collection so you can use it to see what dead things are going away in the debugger dump and visually in CLRProfiler.

    >>For 1.1 it’s probably better to use sos.dll that comes with the latest debuggers. It’s loaded automatically when you attach to a process that has CLR dlls loaded, and it has some fixes and new commands/shortcuts that the original Everett version of sos doesn’t.

    Nothing is ever easy 🙂

    It turns out there is this *other* thing that is also called SOS which isn’t quite the same thing as the SOS that we build along with the runtime even though it has many of the same commands and common heritage. That one seems to be auto-loaded (you can find it in a subdirectory under wherever you install windbg) and I think it works on v1.0 and v1.1 of the runtime. I think it has a few features not present in the original SOS we deployed so it can be useful.

    It may be that there will be an enhanced SOS for version 2.0 of the runtime some time after it ships. So basically you can try the commands without loading an SOS explicitly and just get whatever is there or you can go with the "golden" version and take what was originally shipped.

    For myself, I always use .loadby sos mscorwks but of course I use the runtime build of the moment so anything else would be lunacy. For you, gentle reader, you may find that you like some of the enchanced features in the other SOS and it works fine on your runtime.

    Either way the leak tracking instructions are the same.

  6. Ivan Peev says:

    Rico,

    Why the VS team doesn’t include some functionality, which can automate or help tracing those kinds of managed memory leaks? Those steps can be automated, can’t they ?

  7. Daniel Moth says:

    Blog link of the week 50

  8. Rico Mariani says:

    Ivan Peev asks: "Why the VS team doesn’t include some functionality, which can automate or help tracing those kinds of managed memory leaks? Those steps can be automated, can’t they?"

    There was very little in terms of memory analysis features in Visual Studio.NET — I think that reflects two things: first that there were lots of problems to solve and we couldn’t solve them all in one release and second is that some problems we didn’t really know how to solve anyway. I think some of both is going on in this case.

    Which brings me to the second point: Can this all be automated? Well, sort of, the tricky bits are in Step 5 — how do you automatically know which types are the ones that *should* have gone away and which types were supposed to be living (because they are in a cache or something) — and in Step 7 — how do you automatically know which instances of the type are the problematic ones?

    It’s rather tricky.

  9. rx says:

    this is a good post.. thanks

  10. .Net Adventures says:
  11. Tim Bond says:

    Great blog well worth bookmarking.

  12. Mark Levison says:

    Great post – but why not use a tool do all this hardwork? During our release cycle this fall I found MemProfiler (www.scitech.se/memprofiler/ ) and blogged a bit about my experience (http://dotnetjunkies.com/WebLog/mlevison/archive/2004/09/30/27265.aspx).

    For $100, I’m too lazy too work as hard as you.

  13. Ollie Riches says:

    Great article will make a great set of interview questions 🙂

  14. Ollie Riches says:

    Your blog rocks….

  15. Corbin Dunn says:

    I’ve found the CLRProfiler provides the same information, but in a much easier to use interface.

    Just my 2 cents!

  16. Rico Mariani says:

    You *can* get similar/related information from CLRProfiler and even better information from some 3rd party tools.

    Advantages to the approach given above:

    1) You can do this after the fact if you witness a problem, you don’t have to start under the profiler as you can attach the debugger

    2) You can get per-object information about objects and their reachability (!gcroot) which is much trickier to get from say the heap dump

    3) You can get valuable summary information about the overall memory usage of the CLR (!EEHeap)

    Plus it’s all free 🙂

  17. Hi Rico,

    For all those who don’t like to use command line tools…

    I’ve written a small .NET application that drives the debugger, using the techniques in this article, and displays it in a user interface. The article and source (with screenshot) is on my blog. If you can think of any enhancements that would be useful, please drop me a comment.

  18. <p>&lt;ul&gt;&lt;li&gt;&lt;a href=&quot;http://www.hriders.com/web_page.cfm?web_pageID=38&quot; target=&quot;_blank&quot;&gt;1 Tb mail account?&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;http://mek.oszk.hu/01900/01984/html/&quot; target=&quot;_blank&q

  19. Rico Mariani says:

    The tool looks really neat. I’ll give it a whirl when I’m back from vacation.

    Happy Holidays everyone!

  20. James Alt says:

    Will this work on the ASP.NET worker process? (aspnet_wp.exe) I’ve been having some memory issues where the size of the worker process just keeps growing and growing until I have to manually kill and restart the process before I can continue on with my work.

  21. Seth Hodgson says:

    Thanks for the excellent article Rico. I have one (somewhat related) question about GC memory management/leaks and the CLR. Beyond the steps you outline to identify memory leaks, is there any way to control the min and max managed heap size used by the CLR? Most JVMs allow a min and max managed heap size to be specified as start-up params, and I haven’t seen any mention of something similar for the CLR. I’d ideally like to be able to tell a .NET windows service app I’m developing a maximum allowed heap size, and then let the GC do it’s thing within that constraint. Any thoughts?

  22. Rico Mariani says:

    Some quick responses sort of in order:

    Q: Will this work on the ASP.NET worker process?

    A: I don’t see why it wouldn’t. It’s not magic or anything, and you can attach to it with the debugger same as any other.

    Re: http://www.scitech.se/memprofiler/

    It looks pretty cool, I’ll have to play with it some more. I wonder if it works on the daily build 🙂

    Q: Is there any way to control the min and max managed heap size used by the CLR?

    I don’t think there are environment variables for that but you can do this and more with the hosting api (the CLR calls you to get memory and so forth so that it can be hosted in more exotic processes like say SQL Server where you don’t want us to go and get memory directly)

    http://www.gotdotnet.com/team/clr/about_clr_Hosting.aspx

  23. Seth Hodgson says:

    Hi Rico – thanks for the pointer to the CLR Hosting articles. From what I can tell, the hosting APIs don’t provide a way to limit the amount of memory used by the CLR beyond ICorRuntimeHost’s Start() and Stop() methods. That seems like a strange way to manage CLR resource use – hard stopping it which unloads it from the current process and then restarting it in a new process. Is stopping the CLR the only way to release resources back to the system, and is the CLR team considering any enhancement to the CLR startup shim to allow the min/max managed heap size to be defined at application start?

    Thanks again for all your excellent posts over this past year.

  24. Seth Hodgson says:

    Scratch that – I had overlooked ICorConfiguration ‘s SetGCHostControl method:)

  25. .Net Adventures says:
  26. Tracking down managed memory leaks (how to find a GC leak)

    A number of resource for locating GC leaks:
    You might find this blog entry worth reading:
    http://weblogs.asp.net/ricom/archive/2004/12/10/279612.aspx

    SciTek’s…

  27.   How to track managed memory leak, also how to use windbg and sos extension for managed debugging in…

  28. I was just going through some memory leak information and I stumbled across a newish posting from Tess:…

  29. In recent builds, we have been having an awful memory leak in our system. Silvio was debugging it and…

  30. In recent builds, we have been having an awful memory leak in our system. Silvio was debugging it and…

  31. Managed code makes memory management much easier, but it’s still possible to have unintended memory leaks.

  32. This problem actually comes up pretty often so I thought I’d write a little article about it, and a couple

  33. You’ve been kicked (a good thing) – Trackback from DotNetKicks.com

  34. Here is a little interchange I had a few days ago, "Nick From Chicago" graciously allowed me to share

  35. Here is a little interchange I had a few days ago; &quot;Nick From Chicago&quot; graciously allowed me

  36. &#160; There are numbers of blogs that folks wrote about memory leaks in Microsoft .Net Framework managed

  37. There are numbers of blogs that folks wrote about memory leaks in Microsoft .Net Framework managed code

  38. Last week one of my customers called me to help him resolve a big problem on an asp.net application: a memory leak. During the application stress test, the w3wp process memory increased abnormally as a result of an application crash (application pool

  39. Tracking down managed memory leaks (how to find a GC leak)…

  40. Delay's Blog says:

    In my last post, I explained how it was possible for "hidden" event handlers to introduce memory leaks

  41. It’s been a while since the last post was online. We have been very busy in working on one of the very