The poor man’s way of identifying memory leaks


There is a variety of tools available for identifying resource leaks, but there's one method that requires no tools or special compiler switches or support libraries: Just let the leak continue until the source becomes blatantly obvious.

Nightly automated stress testing is a regular part of any project. Some teams use screen savers as the trigger, others use a custom program, still others require manual launching of the stress test, but by whatever means, after you've gone home for the day, your computer connects to a central server and receives a set of tests that it runs all night.

One of the things that these overnight tests often turn up are memory leaks of one sort or another, identified by the stress team because your program's resource usage has gone abnormally high. But how do you debug these failures? These machines aren't running a special instrumented build with your leak detection tool, so you can't use that.

Instead, you use the "target-rich environment" principle.

Suppose you're leaking memory. After fifteen hours of continuous heavy usage, your program starts getting out-of-memory failures. You're obviously leaking something, but what?

Think about it: If you are leaking something, then there are going to be a lot of them. Whereas things you aren't leaking will be few in number. Therefore, if you grab something at random, it will most likely be a leaked object! In mathematical terms, suppose your program's normal memory usage is 15 megabytes, but for some reason you've used up 1693 megabytes of dynamically-allocated memory. Since only 15 megabytes of that is normal memory usage, the other 1678 megabytes must be the leaked data. If you dump a random address from the heap, you have a greater-than-99% chance of dumping a leaked object.

So grab a dozen or so addresses at random and dump them. Odds are you'll see the same data pattern over and over again. That's your leak. If it's a C++ object with virtual methods, dumping the vtable will quickly identify what type of object it is. If it's a POD type, you can usually identify what it is by looking for string buffers or pointers to other data.

Your mileage may vary, but I've found it to be an enormously successful technique. Think of it as applied psychic powers.

Comments (17)
  1. Anonymous says:

    Of course, this assumes you have the ability to identify what a particular block of memory holds just by looking at a hex dump: even if the addresses you grab starts in the middle of a vtable or instance data.

    That also doesn’t work for handle leaks :) Still, not a bad idea for if nothing else works ..

  2. Wound says:

    We found a great one of these recently. You lose about 300 GDI handles each time you create and destroy the property pages for the phillips webcam capture filter included with Windows XP. Before long you can bring the OS to its knees.

  3. michkap says:

    So, that’s what you mean when you talk from time to time about your "psychic powers" — a debugger and a hex dump! :-)

  4. Anonymous says:

    @Wound

    You don’t even need 3rd party software to hurt the OS. I caught this one yesterday:

    http://weatherley.net/pics/AnotherQualityProduct.png

    A bit of googling told me that wisptis.exe is some tablet PC thingy that everyone gets free with software update – tablet PC or not. Shame it leaks like a seive ;)

    On a less MS bashing note, some tips on catching GDI leaks would be great. GDI leaks do seem a common problem. Take a look at ‘ypager.exe’ – the Yahoo messenger exe – maybe it’s not leaking, but it’s certainly using up those handles…

    The frustrating part for me is when you know handle 0x01231200 is leaking but you have no idea what it refers to or where it was allocated. I guess a GDI object explorer would be a good thing.

    A colleague and I knocked up a dirty program that enumerates GDI handles for processes and identifies the GDI object type that they refer to. It would even have a go at rendering bitmap and DC objects as well as display font information. We knocked off a ton of leaks in a week with that! It did lots of grovelling around the GDI handle table and made lots of naughty assumptions about what it found there – so I suppose Raymond won’t approve :)

  5. Anonymous says:

    JamesW:

    This cool tool will help you find your leaking GDI resources:

    http://msdn.microsoft.com/msdnmag/issues/03/01/GDILeaks/default.aspx

    If I recall, it also makes lots of assumptions about what it finds in the tables. I have seen a few false positives related to hosted IE browser controls, but it’ll quickly help you find lots of real leaks, too.

  6. Anonymous says:

    Mike, if you want to take a little of the guess work out of which addresses to select then the !heap extension is available. With the !heap -a command you’re able to display all the information related to a heap. You’ll be able to see all the allocations and their sizes. Look for a large numbers of allocations with the same size. Randomly examine those for ideas.

    JamesW, handle leaks can quickly be identified by monitoring the Handle Count counter of the process using Performance Monitor. Information about the handle can be displayed with the !handle extension. If you wish to track a handle then it gets a little more complicated. You’ll need to enable Application Verifier for the process and then use the !htrace extension.

    The point of Raymond’s post was to offer a stab in the dark approach to troubleshooting. It’s fast to perform and there’s a chance it will reveal useful results. If not then tools such as UMDH will need to be used against the process to create a user mode stack trace database which can then be studied.

  7. Anonymous says:

    PS – Raymond’s stab in the dark approach is certainly worth bearing in mind. Definitely another tool for the armoury when all else fails.

  8. Anonymous says:

    I use a similar technique as a poor mans profiler. If you have one part of your program using up 90% of the cpu time, if you stop the program at random, theres a 9/10 chance that you stopped it within the bottleneck.

    Repeat 5 or 6 times and you get a pretty good feel for what’s causing the slowdown. A statistical profiler just automates this process :)

  9. Anonymous says:

    @ Adrian

    Yeah – we used that. Trouble was it would tell us 2 HBITMAPs had leaked but it couldn’t tie up the handle to an actual bitmap. That was my moan about knowing handle 0x0babe000 leaked – even if I know it was a bitmap it would be nice to know *which* bitmap. Hence we ended up writing our own GDI object crawler that attempted to render the GDI object. Once we could see the rendition of the bitmap that was leaking it was trivial to plug the gap.

    @Dan

    I have looked at App. Verifier but the problem is we’re stuck on XP SP1 – yeah we should get with the SP2 party but the wheels grind slowly. The problem with being on SP1 is that our product is OpenGL heavy, and the first thing App. Verifier catches and forces a crash on is the MS OpenGL implementation… (Fixed in SP2 according to MSDN)

  10. Anonymous says:

    One of my coworkers, a graphician, used to refer to my usual assembler debuger, monam, as "the astral visualizer" :)

  11. Anonymous says:

    This reminds me of the Monte Carlo method for estimating the area ‘X’ of an irregular shape. If I remember correctly, you draw a box of known area ‘A’ around the shape, then pick thousands of spots at random within the box. Then, X = A * (# spots inside the irregular area)/(total # of spots).

    More on MC methods:

    http://www.chem.unl.edu/zeng/joy/mclab/mcintro.html

    TC

  12. Anonymous says:

    You can get this little software that is free !

    I wrote it at my spare time, it is a leak explorer, you can see GDI objects, Dump buffers and see what was the callstack and the call parameters when creating the leak

    You can send me bugs or ideas to ltearno@free.fr

    Good deleakage (the soft is not incompatible with the poor man’s way…)

  13. Anonymous says:

    If you run the checked build of Windows XP Pro SP2, some applets automatically diagnose and report their own resource leaks. I don’t recall seeing such reports from the checked build of Windows 2000 Pro RTM. My intuitive feeling is that the added functionality was the checking and reporting rather than the leaks themselves (yup, sometimes I forget to be cynical ^_^) but I do wonder if there are plans to fix them.

  14. Anonymous says:

    … for Architects

    Nick Malik – Enterprise Architecture Agility

    Roy Osherove – [Audio Interview] Ingo…

  15. Anonymous says:

    Another one from someone sending a comment:

    I came across your blog and was wondering if what to do…

  16. Anonymous says:

    I got a note from Mukund, who is investigating a memory leak problem. 

    Hi Jessica, Is it true…

Comments are closed.