Notes from debugging a managed memory leak

Recently, I spent a while digging into a managed memory leak. This is a pretty well-worn blog topic, but I figured I would add my two cents to it anyway, as I found a few things that I didn't notice in the existing blogs.

First, Rico wrote up the basic approach back in 2004, so you should start by reading this - https://blogs.msdn.com/ricom/archive/2004/12/10/279612.aspx. This will give you an intro to using sos.dll in WinDbg.

What I would like to add:

#1: How to decide if you have a leak in the first place.

Since GC's happen non-deterministically, it can be hard to know if you actually have a managed leak. For example, if you look at memory usage at the end of a user scenario, you will likely see memory usage all over the map based on when the last GC happened. The best technique I found for this is to stop after gen-2 collections. This isn't perfect since gen-2 collections can still happen at any time in your code, but it still gives you a better estimate then stopping after user scenarios.

To stop after the next gen-2 GC: !findroots -gen 2

Note that this command is new for the CLRv4 version of sos.dll (also available in Silverlight). I am assuming that you could achieve similar functionality with a well-placed breakpoint in older CLRs, but I am not familiar enough with the inner workings of the GC to tell you where.

#2: Use CLRProfiler to visualize the leaks

This may have been specific to my scenario, but I didn't have a lot of success with !gcroot. I had more success understanding the problem by loading up a .log file in CLRProfiler (https://www.microsoft.com/downloads/details.aspx?FamilyID=a362781c-3870-43be-8926-862b40aa0cd0&DisplayLang=en). One note that I found here was to _not_ use '-xml' when saving out the log as CLRProfiler doesn't understand the XML format.

To save the log out: !TraverseHeap c:\users\greggm\desktop\myheap.log

#3: !gcroot doesn't show roots in CCW's

When native code calls into managed code from COM, native code gets a CCW (COM callable wrapper). If native code leak's its CCW, the managed object will be leaked, but !gcroot will not tell you why.