Finding the leaked object reference by scanning memory: Example


An assertion failure was hit in some code.

    // There should be no additional references to the object at this point
    assert(m_cRef == 1);

But the reference count was 2. That's not good. Where is that extra reference and who took it?

This was not code I was at all familiar with, so I went back to first principles: Let's hope that the reference was not leaked but rather that the reference was taken and not released. And let's hope that the memory hasn't been paged out. (Because debugging is an exercise in optimism.)

1: kd> s 0 0fffffff 00 86 ec 00
04effacc  00 86 ec 00 c0 85 ec 00-00 00 00 00 00 00 00 00  ................ // us
0532c318  00 86 ec 00 28 05 00 00-80 6d 32 05 03 00 00 00  ....(....m2..... // rogue

The first hit is the reference to the object from the code raising the assertion. The second hit is the interesting one. That's probably the rogue reference. But who is it?

1: kd> ln 532c318
1: kd>

It does not report as belong to any module, so it's not a global variable.

Is it a reference from a stack variable? If so, then a stack trace of the thread with the active reference may tell us who is holding the reference and why.

1: kd> !process -1 4
PROCESS 907ef980  SessionId: 2  Cid: 06cc    Peb: 7f4df000  ParentCid: 0298
    DirBase: 9e983000  ObjectTable: a576f560  HandleCount: 330.
    Image: contoso.exe

        THREAD 8e840080  Cid 06cc.0b78  Teb: 7f4de000 Win32Thread: 9d04b3e0 WAIT
        THREAD 91e24080  Cid 06cc.08d8  Teb: 7f4dd000 Win32Thread: 00000000 WAIT
        THREAD 8e9a3580  Cid 06cc.09f8  Teb: 7f4dc000 Win32Thread: 9d102cc8 WAIT
        THREAD 8e2be080  Cid 06cc.0878  Teb: 7f4db000 Win32Thread: 9d129978 WAIT
        THREAD 82c08080  Cid 06cc.0480  Teb: 7f4da000 Win32Thread: 00000000 WAIT
        THREAD 90552400  Cid 06cc.0f5c  Teb: 7f4d9000 Win32Thread: 9d129628 WAIT
        THREAD 912c9080  Cid 06cc.02ec  Teb: 7f4d8000 Win32Thread: 00000000 WAIT
        THREAD 8e9e8680  Cid 06cc.0130  Teb: 7f4d7000 Win32Thread: 9d129cc8 READY on processor 0
        THREAD 914b8b80  Cid 06cc.02e8  Teb: 7f4d6000 Win32Thread: 9d12d568 WAIT
        THREAD 9054ab00  Cid 06cc.0294  Teb: 7f4d5000 Win32Thread: 9d12fac0 WAIT
        THREAD 909a2b80  Cid 06cc.0b54  Teb: 7f4d4000 Win32Thread: 00000000 WAIT
        THREAD 90866b80  Cid 06cc.0784  Teb: 7f4d3000 Win32Thread: 93dbb4e0 RUNNING on processor 1
        THREAD 90cfcb80  Cid 06cc.08c4  Teb: 7f3af000 Win32Thread: 93de0cc8 WAIT
        THREAD 90c39a00  Cid 06cc.0914  Teb: 7f3ae000 Win32Thread: 00000000 WAIT
        THREAD 90629480  Cid 06cc.0bc8  Teb: 7f3ad000 Win32Thread: 00000000 WAIT

Now I have to dump the stack boundaries to see whether the address in question lies within the stack range.

1: kd> dd 7f4de000 l3
7f4de000  ffffffff 00de0000 00dd0000
1: kd> dd 7f4dd000 l3
7f4dd000  ffffffff 01070000 01060000
...
1: kd> dd 7f4d7000 l3
7f4d7000  ffffffff 04e00000 04df0000 // our stack
...

The rogue reference did not land in any of the stack ranges, so it's probably on the heap. Fortunately, since it's on the heap, it's probably part of some larger object. And let's hope (see: optimism) that it's an object with virtual methods.

0532c298  73617453
0532c29c  74654d68
0532c2a0  74616461
0532c2a4  446e4961
0532c2a8  00007865
0532c2ac  00000000
0532c2b0  76726553 USER32!_NULL_IMPORT_DESCRIPTOR  (USER32+0xb6553)
0532c2b4  44497265
0532c2b8  45646e49
0532c2bc  41745378 contoso!CMumble::CMumble+0x4c
0532c2c0  00006873
0532c2c4  00000000
0532c2c8  4e616843
0532c2cc  79546567
0532c2d0  4e496570
0532c2d4  00786564
0532c2d8  2856662a
0532c2dc  080a9b87
0532c2e0  00f59fa0
0532c2e4  05326538
0532c2e8  00000000
0532c2ec  00000000
0532c2f0  0000029c
0532c2f4  00000001
0532c2f8  00000230
0532c2fc  fdfdfdfd
0532c300  45ea1370 contoso!CFrumble::`vftable'
0532c304  45ea134c contoso!CFrumble::`vftable'
0532c308  00000000
0532c30c  05b9a040
0532c310  00000002
0532c314  00000001
0532c318  00ec8600

Hooray, there is a vtable a few bytes before the pointer, and the contents of the memory do appear to match a CFrumble object, so I think we found our culprit.

I was able to hand off the next stage of the investigation (why is a Frumble being created with a reference to the object?) to another team member with more expertise with Frumbles.

(In case anybody cared, the conclusion was that this was a variation of a known bug.)

Comments (16)
  1. Query says:

    How would you have done if the reference count instead was too small then?

  2. voo says:

    It sounds so easy and obvious when you do it, but I'll admit: I wouldn't have thought of that - good tool in my repertoire though, although I hope I won't need to debug things like this often.. or ever ;)

  3. If the reference count is too small, that usually means the object gets deleted early, and then some poor client of the object tries to use it after it was deleted; this typically results in a crash.

    I've had good luck using the Time Travel Tracing diagnostic to go back in time and dump all the stacks when the m_cRef memory was written to, then matching up AddRef and Release's in pairs until I find the guilty culprit that called Release more times than AddRef.

  4. DWalker says:

    That looks like mainframe VM debugging, which I did many years ago... sometimes from printouts of a memory dump....

  5. Anon says:

    @Maurits

    Unfortunately, us normals still don't have access to iDNA.

  6. You can get much the same effect without Time Travel Tracing by setting breakpoints on AddRef and Release which dump the stack and continue.

  7. gdalsnes says:

    Try to google iDNA: Internationalizing Domain Names in Applications? I would strongly suggest stop using iDNA when its impossible to google. Just confusing.

  8. Joshua says:

    Do not be discouraged by the low comments. This is an example of a good blog post all the same.

  9. Bob_C says:

    @arghhhhhhhhhhh: Just Google "iDNA time". It's there.

  10. B. says:

    Hey Raymond, I have a question but can't find an answer.

    I have a piece of code (assembler) that references the function pointer

    push offset function_x

    push x

    ...

    call y

    ..

    function_x proc uses esi edi ebx...

    but when I compile it in debug mode (using MASM) this code is changed to

    push offset function_x_thunk

    push x

    ...

    call y

    function_x_thunk offset function_x

    and my code fails in debug mode.

    How to avoid that (creating those thunks for referenced functions) :)

    Thank you

  11. sense says:

    I lllllove these posts!! Very insightful.

    Thanks. It means a lot to us.

  12. Joshua says:

    @B: Read the documentation for your linker.

  13. Bob says:

    @arghhhhhhhhhhh: @Bob_C:  

    Or, search using Bing. Its first two links returned from searching for 'iDNA' are relevant. I'm not meaning to imply anything about general search quality, but perhaps Bing does a better job of finding Microsoft-related information. Or maybe more people use it when specifically searching for Microsoft-related information, so it ranks those sources higher in the results.

    (I am not associated with Microsoft other than as a Windows user...and probably owning stock through index funds.)

  14. Ama says:

    how did you dump the heap? i don't understand that part......

    also for the stack.....you dump the entire stack of each thread?

    Thanks.

  15. 1/2 full!! says:

    @Bob. "I'm not meaning to imply anything about general search quality, but ...??"

    Good. Because once upon a time there was a blog entry somewhere about those sort of assumptions.(blogs.msdn.com/.../10127054.aspx).

  16. Mark says:

    Ama: that's just a dump of the symbols leading up to the address of the pointer, something like "ddS 532c318-80 L80"

Comments are closed.

Skip to main content