Tracking down native leaks

I have been spending some time looking at native memory leaks recently, and I decided to blog about some of the techniques which worked well for me.

First, find out what objects are leaking. If your code doesn’t already have some sort of leak detection scheme, you can take advantage of the fact that the operating system’s heap does. KB 268343 explains how to use UMDH to get a text file of all the leaking callstacks. Keep in mind that to get useful output, _NT_SYMBOL_PATH must point to symbols for both the operating system and your application.

So now you know which objects are leaking, but for ref-counted objects, an allocation callstack probably doesn’t get you to a solution. You need to find the missing ‘Release’ call.

Here is what I did that worked pretty well:

Step #1: Get a good repro. Without a repro, you don’t have a chance for leak bugs.

Step #2: Look at all the reported leaks and find a leaf object that is leaked. Often times leaks chain where one object is leaked, but that object contains a reference to another object. So the other object must also leak. Before spending ages looking at all the AddRef/Release calls to a leaking object, make sure that you are looking at a leaf object. For an example, if all 'apple' objects contain references to their 'tree' objects and trees and apples are leaking, look at the apple leaks instead of the tree leaks.

Step #3: Find an instance of a particular object that leaks. Set a tracepoint on the destructor and contructor of the leaking class. On the constructor, I would recommend a When Hit message like ‘New Object: {(void*)this} $CALLSTACK’. On the destructor, try a When Hit message of ‘Delete Object: {(void*)this}’. Run your scenario and do some time text processing to find out which callstacks leaked and didn’t. In batch script:

for /f "tokens=3" %d in ('findstr /c:"New Object" debug.txt') do findstr %d debug.txt

Where debug.txt is a text file that I saved the content of the output window to. Now I can look at all the allocation callstacks for the object – both those that leaked and those that didn’t. You will need to find some pattern that will allow you to predict one leaking object instance. If your lucky, the pattern is something simple like ‘the first instance created’.

Step #4: Trace all the AddRef/Release calls. For this I would recommend setting a data tracepoint on the field that holds the ref count. To do this: Debug->New Breakpoint->New Databreakpoint and set the location to the address where the ref count is stored (ex: 0x10E89934). Then change the ‘When Hit…’ property to output a callstack and the current ref count -- ‘{*((DWORD*)0x10E89934) } $CALLSTACK’.
When your done, hopefully you have a reasonable list of AddRef/Release calls to your object. So its easy enough to match them up and see what went wrong.