Non-psychic debugging: Looking for leaked objects by their vtable

A programmer on the GHI team reported that they were hitting an assertion failure using an internal library and asked for help debugging it.

DEF!CWidget::`scalar deleting destructor'+0xd
ABC!operator delete()+0x6

I didn't work on this internal library, but on the other hand I'm also not afraid to look inside and around.

The assertion failure said, "Assertion failed: All widgets from a factory must be destroyed before you can unregister the factory."

The factory does not keep a list of all the widgets it created. It merely keeps a count and asserts that the count is zero when the factory is unregistered."

A good start would be to find the widgets that are still outstanding, so we can try to figure out why they weren't destroyed.

0:000> u ABC!CWidget::CWidget
1071158b mov     dword ptr [esi],offset ABC!CWidget::`vftable' (106da08c)

This gives us the widget vtable, so a memory scan should find all the outstanding widgets.

0:000> !heap -search 106da08c
    _HEAP @ 950000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        01eb12d8 000e 0000  [00]   01eb12e0    00064 - (busy)

Okay, so a search of the heap shows that there is only one widget, and it is at 0x01eb12e0. Let's see what that widget can tell us about who it is.

0:000> dt ABC!CWidget 01eb12e0
   +0x000 __VFN_table : 0x106da08c
   +0x004 m_uBucketId      : 2
   +0x008 m_rgClassData    : 
   +0x050 m_rgSharedData   :
   +0x05c m_fLocked        : 1
   +0x060 m_pszName        : 0x01eba4c0  "GHI_widget"

Hey, how about that. The widget conveniently has the name GHI_widget, which seems like a pretty good sign that the GHI component leaked a widget.

Notice that I didn't use any special knowledge of Widgets, Widget Factories, the ABC component, or the GHI component. All I did was take the error message that said, "You leaked a widget" and said, "Maybe I should go look for that widget. That may tell us something." I disassembled the widget constructor to look for a unique tag common to all widgets, and then scanned memory looking for that vtable. From the found object, I dumped its member variables looking for some sort of clue as to its identity, and by an amazing stroke of luck, the widget had a name.

Back in my trainee days in tech support, if a customer asked a question that we couldn't answer, we escalated the problem to the next higher level and were encouraged to tag along and learn from the subject matter expert. That way, when the problem came up again, we could solve it ourselves.

In other words, we were encouraged not to run away from information, but to run toward it.

(It helped that we weren't graded on "number of cases closed per second.")

One of the most important skills in a programmer is the willingness to look at code that you didn't write. When I joined Microsoft, this instinct to run toward information led me to watch as somebody else debugged a problem and learn from them. I would then go back and read the code that they debugged to see how much of it I could understand. And if I ran into a problem of my own, I dove in and read the source code to the component that was giving me trouble, even if it was not a component I remotely had any responsibility for. Maybe I could figure out what it was doing, maybe I couldn't, but at least I gave it a try. And when I went to another developer with my theory, I was told either that my understanding was correct, or that I had gotten it wrong and was told the correct answer. Either way, I learned a little bit more that day.

Exercise: If the widget had not had a name, what would be a reasonable next step in the investigation?

Comments (25)
  1. Joshua says:

    Look which DLL's BSS segment it is in perhaps. Or run with a tagging heap.

    [This is an object created from a factory (presumably not a singleton factory) so the object won't be in a static segment. -Raymond]
  2. Ben says:

    If only one was leaked, the code path was called once, so there is a good chance that it is from initialisation code. Which means a good chance that if you run it twice, it will get the same address. Breakpoint on memory, re-run and look at the call stack. (Turn on debug heap though).

    [Nice idea, but it didn't help in this case. It was a code path that got hit only once during the scenario, but it wasn't initialization. (Besides, the initialization of the offending component happens deep into the scenario, so the heap is pretty randomized at that point.) -Raymond]
  3. Henke37 says:

    For a plan B I would check those "bucketId" and "VFN_table" members. They look like they could contain similarly identifying data.

    [In this case, VFN_table says "I am a CWidget" (0x106da08c) so you didn't learn much from that. But the other members may indeed provide some clues. -Raymond]
  4. Spike says:

    I'd search memory for an instance of the address of the Widget to try to find who's holding it.

    [Good idea. It didn't help in this case (it was a flat-out leak) but it's a good trick to keep in mind. -Raymond]
  5. William says:

    It strikes me that the absence of a name can be as identifying as the presence of one; an empty string is still a string. Look for widget creations that don't specify a name; if only a few such creations (or, ideally, just one) exist, your search space for potentially leaked widgets is sharply reduced.

    [Good one. I hadn't thought of that. -Raymond]
  6. Medinoc says:

    Would this work if Widget were the root of a class hierarchy? Objects of a derived class would have a different vtable, so one would have to check for all vtables of all possible derived classes…

    [Yup, it wouldn't work if the leaked object was derived. We got lucky. -Raymond]
  7. Joshua says:

    [This is an object created from a factory (presumably not a singleton factory) so the object won't be in a static segment. -Raymond]

    But the pointer to it probably is. If it isn't the tagging heap will find who allocated the pointer, unless it's on the stack. If it's on the stack, a stack walk will.

    [Oh, sorry. I thought the "it" you were referring to was the widget, not the pointer to the widget (which was never mentioned in the article). Yes, searching for the pointer, then seeing who holds that pointer would be something to try. (It didn't help in this case, but it was a good try.) -Raymond]
  8. Gabe says:

    I would hook the Widget factory constructor/destructor and log the stack frames when each was called. Assuming that the module which creates the Widget is responsible for destroying it, you'd easily be able to see which module is creating too many Widgets.

    If you also log the Widget pointers, you can match up the creations to the destructions, telling you what code is actually creating the leaking Widget(s).

    This method eliminates the need for a lot of luck (the Widget has a name, the name is unique, the object isn't a subclass), but cannot be done post-mortem. If you can't repro the scenario, you can't do it.

    [This was encountered in a stress run, so reproducibility is unknown but probably low. You will have to debug it post-mortem. -Raymond]
  9. Joshua says:

    [This was encountered in a stress run, so reproducibility is unknown but probably low.]

    I must be the only guy who designs repeatable stress runs.

    [But if you run multiple stress tests simultaneously, you still get nonrepeatability. "The video driver test just happened to perform a resolution change at the exact moment the UI test had the XYZ menu open, which exposed an unhandled condition in the menu code." -Raymond]
  10. Tony Cox [MSFT] says:

    Joshua – it's certainly good to make your tests as close to 100% repeatable as possible, but it's not always possible, especially with stress tests. Often the kind of bugs you're looking for in stress runs are things like nasty race conditions, deadlocks, failure to handle system resource exhaustion, or bugs that only manifest when external operations start unexpectedly timing out. Those things often very from machine-to-machine and run-to-run just by their nature.

  11. Ben says:

    Edit the code to call _CrtDebugBreak if the passed-in name is "GHI_widget" then re-run the test… That's what I would do.

  12. Alois Kraus says:

    I would instrument the factory code and the dtor with ETW Events to track creation and destructions to check for any imbalances. Then enable the ETW provider with stack walking on all machines that execute the stress test. One further repro of it will suffice to get the root cause. If you do not want to change the code you could also use a hooking library when you have the symbols at hand to inject the etw events at runtime like here (…/155056.aspx).

    When I only have a dump I would check who is referencing the widget address. If it is completly orphaned the garbage collector will take care ;-). Ups C++. Then I would stick to step one.

  13. Pete says:

    This question is asked purely out of practical inexperience, but what tool do you use that lets you type !heap -search whatever and get the print-out that you posted?

    [Debugging Tools for Windows. -Raymond]
  14. Anon says:


    I love this extension+script. Incredibly useful when "I need to find these objects, but I don't actually know their names."…/searching-and-displaying-c-heap-objects-in-windbg

  15. Name withheld says:

    Would performing heap -s 10eb12e0 (UserPtr) help identify another clue as to what could be holding this ABC!CWidget? Maybe whatever is holding it provides name or another clue as to why it is leaked?

  16. Leif says:

    The final paragraphs of this post remind me of the early days of my career (also in tech support). I am also reminded of one of my greatest resentments throughout my career as a developer: I resent those coworkers who have an "answers at the back of the book" (or end of chapter) mentality. If the answer to their problem isn't immediately obvious, they throw their hands up and give up, and go looking for answers from the "gurus". They never seem to become any more knowledgeable or skilled. Yet they somehow find their way onto development teams sometimes. Kindhearted gurus end up doing all their work for them. They are dead weight.

  17. Doug says:

    Regarding the exercise:

    I'll assume that the leaked CWidget's name is useless (not necessarily empty), since William pointed out that an empty string is still a string. I'll also assume that the other data members aren't much use either (without internal knowledge, at least).

    At this point, I get the (probably bad) instinct to start diving into the internals on a small scale. Still, running with my instincts tells me to assume a few things from the context of this story.

    1) A CFrame holds (zero or more) CWidgets.

    2) A CWidget determines in it's destructor when to try to unregister the factory that created it.

    3) A CFrame deletes all its CWidgets when it is destroyed, which involves enumerating over them.

    I would try to walk the stack, grab the this pointer for the CWidget being deleted, and look at its state (and the code for the destructor) to try to determine why it's trying to unregister the factory. Failing that, I'd grab the this pointer for the CFrame being destroyed and try to see why the enumeration hasn't gone over the leaked CWidget yet.

    Regarding my proposed investigation:

    Hmm… I assumed that the bug was in ABC or DEF (probably DEF). Additionally, assumption 3 (also 2) assumes quite a bit (mostly that deletion/destruction (as far as the factory is confirmed) is done by the CFrame) about the life cycle of a CWidget that could be wrong. The first mistake would lead me in the wrong direction, which immediately puts me on the side of "I had gotten it wrong and was told the correct answer." Looking at your replies to the other comments now, I'm getting more and more of a feeling that my assumptions about the life cycle of a CWidget are wildly wrong. Knowing a bit about that would probably have helped to avoid the mistake of assuming the bug was in DEF (or ABC), as well. As for why I made such assumptions, I think I was seeing the framework through Swing coloured glasses. Working with different GUI frameworks (and just working with them more) would probably help out with that.

    Regarding the point of the post:

    Yes, I absolutely agree. Sometimes it doesn't work out, but sometimes you just get lucky. And now that I'm basically paraphrasing you I know that I've over thought this post andIneedtowrapthisupbeforeImakeevenmoreofafooloutofmyself.

  18. Brian_EE says:

    I find it satisfying to dig into a failure and to find the source. I remember one time when one of our SW developers complained that hooking up the radio to the power amplifier (I live in the embedded world) caused the radio to crash and the problem must be with my FPGA in the PA.

    I had access to the radio GPP source code and build tools, so I dug in at the source level and through layers of C++ full object-oriented code. I found that they had copied a module from a different product that didn't handle the differing protocol we used. It dereferenced a NULL pointer.

    I had a big grin on my face when I pointed out the exact line in *their* source that was the real problem.

  19. alegr1 says:

    >Or run with a tagging heap.

    The best would be to enable heap tracing. But it just doesn't work.

    That would be GREAT if it worked. But there is a HARD limit on the heap trace size, and it's ridiculously small. Even in x64 Win212R2 it's equally ridiculously small. There was a time I needed to find out why there was a heap leak in 2012R2 (and not in 2012), and could not use the heap trace because it would fill way before the leak happened.

  20. Anon says:


    100% agree. There's a place for going to ask someone, but too many people never want to learn ANYTHING about their jobs.

    I very often draw an inverse correlation between the amount of education someone willingly pays for and their level of willingness to actually attempt to solve problems on their own…

  21. Timo Kinnunen says:

    Only thing I can think of that hasn't been mentioned yet is looking at the memory page where the leaked CWidget was allocated. The CWidget itself looks to be intact so the allocator likely hasn't freed or reused the memory. Maybe there's a chain of pointers to this allocator that can pinpoint the module responsible for destroying it.

    Another variation, looking at adjacent objects in the memory block. They could have been allocated from the same place the widget was allocated and still alive.

  22. Adam says:

    I'd look at the problem from the other end. You know what has leaked, and there's most likely only a handful of lines of code that actually allocate one of those objects. If you examine them all carefully you might be able to spot the cause of the leak.

    Of course doing things that way round, even if you spot something that looks like it could cause a leak, you can't always be certain that you've found the actual cause.

  23. Quietust says:

    If the library had been built with RTTI included, tracing backwards from the vtable pointer would lead you to a class name which might give you some identification for the leaked widget. Not much, mind you, but better than nothing at all.

  24. Andrew. says:

    m_uBucketId=2 looks interesting to me and might be a container ID for the widget. I'd look at the code that handles that id and see if it led me to the component owning the leaked widget.

  25. Anon says:


    99% of the time, we don't have, or don't have access to, the source. Lucky to get symbols, if anything.

    The real fun starts when you don't even have public symbols, and have to trace through the disassembly just to get enough information to force the people who caused the problem to admit there's a problem and fix it.

Comments are closed.