Tracking down managed memory leaks (how to find a GC leak)

If you think you've got memory leaks, or if you're just wondering what kind of stuff is on your heap you can follow the very same steps that I do and get fabulous results your friends will envy.  OK, well maybe not,  but they're handy anyway. 

These steps will help you to go from a suspected memory leak to the specific object references that are keeping your objects alive.  See the Resources at the end for the location of all these tools.

Step 1: Run your process and put it in the state you are curious about

Be sure to choose a scenario you can reproduce if that's at all possible, otherwise you'll never know if you're making headway in clearing out the memory leaks.

Step 2: Use tasklist to find its process ID

C:\>tasklist

Image Name PID Session Name Session# Mem Usage
========================= ====== ================ ======== ============
System Idle Process 0 RDP-Tcp#9 0 16 K
System 4 RDP-Tcp#9 0 112 K
smss.exe 624 RDP-Tcp#9 0 252 K
...etc...
ShowFormComplex.exe 4496 RDP-Tcp#9 0 20,708 K
tasklist.exe 3636 RDP-Tcp#9 0 4,180 K

From here we can see that my process is ID #4496

Step 3: Use VADump to get a summary of the process

C:\>vadump -sop 4496
Category Total Private Shareable Shared
Pages KBytes KBytes KBytes KBytes
Page Table Pages 35 140 140 0 0
Other System 15 60 60 0 0
Code/StaticData 4596 18384 4020 3376 10988
Heap 215 860 860 0 0
Stack 30 120 120 0 0
Teb 4 16 16 0 0
Mapped Data 129 516 0 24 492
Other Data 157 628 624 4 0

      Total Modules 4596 18384 4020 3376 10988
Total Dynamic Data 535 2140 1620 28 492
Total System 50 200 200 0 0
Grand Total Working Set 5181 20724 5840 3404 11480

Here we can see that the process is mostly code (18384k)

The vast majority of the resources that the CLR uses are under "Other Data" -- this is because the GC Heap is directly allocated with VirtualAlloc -- it doesn't go through a regular windows heap.  And same for the so-called "loader heaps" which hold type information and jitted code.  Most of the conventional "Heap" allocations are from whatever unmanaged is running.  In this case it's a winform application with piles of controls so there's storage associated with those things.

There isn't much "Other Data" here so the heap situation is probably pretty good but let's see where we stand on detailed CLR memory usage.

Step 4: Attach Windbg and load SOS

C:\> windbg -p 4496

Once the debugger loads use this command to load our extension DLL

0:004> .loadby sos mscorwks

This tells the debugger to load the extension "sos.dll" from the same place that mscorwks.dll was loaded.  That ensures that you get the right version of SOS (it should be the one that matches the mscorwks you are using)

Step 5: Get the CLR memory summary

This command gives you a summary of what we've allocated.  The output will be a little different depending on the version of the runtime you are using.  But for a simple application you get the loader heaps for the two base domain structures (they just hold objects that can be shared in assorted ways) plus the storage for the first real appdomain (Domain 1).  And of course the jitted code.

0:004> !EEHeap
Loader Heap:
--------------------------------------
System Domain: 5e093770
...etc...
Total size: 0x8000(32768)bytes
--------------------------------------
Shared Domain: 5e093fa8
...etc...
Total size: 0xa000(40960)bytes
--------------------------------------
Domain 1: 14f0d0
...etc...
Total size: 0x18000(98304)bytes
--------------------------------------
Jit code heap:
LoaderCodeHeap: 02ef0000(10000:7000) Size: 0x7000(28672)bytes.
Total size: 0x7000(28672)bytes
--------------------------------------
Module Thunk heaps:
...etc...
Total size: 0x0(0)bytes
--------------------------------------
Module Lookup Table heaps:
...etc...
Total size: 0x0(0)bytes
--------------------------------------
Total LoaderHeap size: 0x31000(200704)bytes

So here we've got 200k of stuff associated with what has been loaded, about 28k of which is jitted code. 

Next in the output (same command) is the summary of the GC Heap.

=======================================
Number of GC Heaps: 1
generation 0 starts at 0x00a61018
generation 1 starts at 0x00a6100c
generation 2 starts at 0x00a61000
ephemeral segment allocation context: none
segment begin allocated size
001b8630 7a8d0bbc 7a8f08d8 0x0001fd1c(130332)
001b4ac8 7b4f77e0 7b50dcc8 0x000164e8(91368)
00157690 02c10004 02c10010 0x0000000c(12)
00157610 5ba35728 5ba7c4a0 0x00046d78(290168)
00a60000 00a61000 00aac000 0x0004b000(307200)
Large object heap starts at 0x01a61000
segment begin allocated size
01a60000 01a61000 01a66d90 0x00005d90(23952)
Total Size 0xcdd18(843032)
------------------------------
GC Heap Size 0xcdd18(843032)

You will likely have many fewer small-heap segments than I did because I did this test on an internal debug build so there's funny things like a 12 byte segment in the dump.  But you'll see what segements there are and how big they are and you can see what the current boundaries are on the generations from which you can compute thier current exact size.  (Note that this is likely different than what the performance counters report as you can see in Maoni's blog -- those counters are budgeted from the last GC not the instanteous value -- it would be too expensive to keep updating the instantaneous value)

So in this case we can see that there is about 843k of GC heap.  Comparing that to the other data category there was about 2M total of other data.  The CLR accounts for about 1M of that.  The rest is likely bitmaps allocated from my winforms application's controls but whatever it is, it isn't CLR stuff...

Step 5: Dump the GC Heap statistics

Next we'll want to know, by type, what's on the heap at this exact instant

0:004> !DumpHeap -stat
... sorted from smallest to biggest ... etc. etc...

7b586c7c 436 10464 System.Internal.Gdi.WindowsGraphics
5ba867ac 208 11648 System.Reflection.RuntimeMethodInfo
7b586898 627 12540 System.Internal.Gdi.DeviceContext
5baa4954 677 39992 System.Object[]
5ba25c9c 8593 561496 System.String
Total 17427 objects

Note that this dump includes both reachable and unreachable objects so unless you know that the GC just ran before you did this command you'll see some dead stuff in this report as well.  Sometimes its interesting and useful to force a GC to run before you do this so that you can get a summary of just the live stuff.  Sometimes it's useful to do dumps before and after forcing a GC so that you can see what sort of things are dying.  This may be a way to gather evidence that a forced GC is necessary.  See my blog on When to Call GC.Collect().

So let's suppose that there weren't supposed to be 208 System.Reflection.RuntimeMethodInfo objects allocated here and that we thought that was a leak.  One of the things we'll want to do is to use CLR Profiler to see where those objects are being allocated -- that will give us half the picture.  But we can get the other half of the picture right here in the debugger.

Step 6: Dump Type Specific Information

We can dump each object whose type name includes a given string with a simple command

0:004> !DumpHeap -type System.Reflection.RuntimeMethodInfo
Address MT Size
00a63da4 5baa62c0 32
00a63e04 5baa6174 20
00a63e2c 5ba867ac 56
00a63e64 5baa5fa8 16
00a63e88 5baa5fa8 16
00a63f24 5baa6174 20
00a63f4c 5ba867ac 56
00a63f84 5baa5fa8 16
etc. etc. etc.
total 630 objects
Statistics:
MT Count TotalSize Class Name
5baa62c0 3 96 System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[[System.Reflection.RuntimeMethodInfo, mscorlib]]
5baa5fa8 211 3376 System.Reflection.CerArrayList`1[[System.Reflection.RuntimeMethodInfo, mscorlib]]
5baa6174 208 4160 System.Collections.Generic.List`1[[System.Reflection.RuntimeMethodInfo, mscorlib]]
5ba867ac 208 11648 System.Reflection.RuntimeMethodInfo
Total 630 objects

Note that the type we wanted was System.Reflection.RuntimeMethodInfo and we can see that it has a method table 5ba867ac.  Those are the 56 byte objects. Now we can investigate some of these and see what is causing them to stay alive.

Step 7: Identify the roots of suspected leaks

One of the lines in the dump was

00a63e2c 5ba867ac 56    

So that tells us there is an object of the type we want at address 00a63e2c.  Let's see what's keeping it alive

0:004> !gcroot 00a63e2c
Scan Thread 0 OSTHread 1598
Scan Thread 2 OSTHread 103c

DOMAIN(0014F0D0):
HANDLE(WeakLn):3f10f0:
Root:00a63d20(System.RuntimeType+RuntimeTypeCache)
->00a63da4(System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[[System.Reflection.RuntimeMethodInfo,mscorlib]])
->00a63e88(System.Reflection.CerArrayList`1[[System.Reflection.RuntimeMethodInfo, mscorlib]])
->00a63e98(System.Object[])
->00a63e2c(System.Reflection.RuntimeMethodInfo)

DOMAIN(0014F0D0):
HANDLE(Pinned):3f13ec:
Root:01a64b50(System.Object[])
->00a62f20(System.ComponentModel.WeakEventHandlerList)
->00a63fb4(System.ComponentModel.WeakEventHandlerList+ListEntry)
->00a63ec4(System.ComponentModel.WeakEventHandlerList+ListEntry)
->00aa5f6c(System.ComponentModel.WeakDelegateHolder)
->00a63e2c(System.Reflection.RuntimeMethodInfo)

I've added some extra line breaks to the output above to make it easier to read but otherwise it's the raw output.

The gcroot command is trying to tell you if the object is reachable and if so how it is reached from each root.  The dump won't include all the ways the object is reachable but you do get at least one way to find the object -- usually that's enough.  If multiple paths are dumped they often have a common tail.  However the object is reachable (here it looks like maybe only weak references are left so this guy might go away on the next collect) that should give you a hint about (some of) the remaining references.  From there you can decide what pointers to null so that the object is properly released.

Resources

You can get information on windbg at this location.
Vadump has usage information and a download from msdn and microsoft.com respectively.

If those links break, searching for windbg and vadump on the microsoft.com home page gave good results, that's how I got those links in the first place. 

CLR Profiler is available here.

It comes with documentation but there is additional material available in the Performance PAG in Chapter 13.