Psychic debugging: IP on heap


Somebody asked the shell team to look at this crash in a context menu shell extension.

IP_ON_HEAP:  003996d0

ChildEBP RetAddr
00b2e1d8 68f79ca6 0x3996d0
00b2e1f4 7713a7bd ATL::CWindowImplBaseT<
                           ATL::CWindow,ATL::CWinTraits<2147483648,0> >
                     ::StartWindowProc+0x43
00b2e220 77134be0 USER32!InternalCallWinProc+0x23
00b2e298 7713a967 USER32!UserCallWinProcCheckWow+0xe0
...

eax=68f79c63 ebx=00000000 ecx=00cade10 edx=7770df14 esi=002796d0 edi=000603cc 
eip=002796d0 esp=00cade4c ebp=00cade90 iopl=0         nv up ei pl nz na pe nc 
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010206 
002796d0 c744240444bafb68 mov     dword ptr [esp+4],68fbba44

You should be able to determine the cause instantly.

I replied,

This shell extension is using a non-DEP-aware version of ATL. They need to upgrade to ATL 8 or disable DEP.

This was totally obvious to me, but the person who asked the question met it with stunned amazement. I guess the person forgot that older versions of ATL are notorious DEP violators. You see a DEP violation, you see that it's coming from ATL, and bingo, you have your answer. When DEP was first introduced, the base team sent out mail to the entire Windows division saying, "Okay, folks, we're turning it on. You're going to see a lot of application compatibility problems, especially this ATL one."

Psychic powers sometimes just means having a good memory.

Even if you forgot that information, it's still totally obvious once you look at the scenario and understand what it's trying to do.

The fault is IP_ON_HEAP which is precisely what DEP protects against. The next question is why IP ended up on the heap. Was it a mistake or intentional?

Look at the circumstances surrounding the faulting instruction again. The faulting instruction is the window procedure for a window, and the action is storing a constant into the stack. The symbols of the caller tell us that it's some code in ATL, and you can even go look up the source code yourself:

template <class TBase, class TWinTraits>
LRESULT CALLBACK CWindowImplBaseT< TBase, TWinTraits >
  ::StartWindowProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam) {
    CWindowImplBaseT< TBase, TWinTraits >* pThis =
              (CWindowImplBaseT< TBase, TWinTraits >*)
                  _AtlWinModule.ExtractCreateWndData();
    pThis->m_hWnd = hWnd; 
    pThis->m_thunk.Init(pThis->GetWindowProc(), pThis); 
    WNDPROC pProc = pThis->m_thunk.GetWNDPROC(); 
    ::SetWindowLongPtr(hWnd, GWLP_WNDPROC, (LONG_PTR)pProc);
    return pProc(hWnd, uMsg, wParam, lParam);
} 

Is pProc corrupted and we're jumping to a random address on the heap? Or was this intentional?

ATL is clearly generating code on the fly (the window procedure thunk), and it is in execution of the thunk that we encounter the DEP exception.

Now, you didn't need to have the ATL source code to realize that this is what's going on. It is a very common pattern in framework libraries to put a C++ wrapper around window procedures. Since C++ functions have a hidden this parameter, the wrappers need to sneak that parameter in somehow, and one common technique is to generate some code on the fly that sets up the hidden this parameter before calling the C++ function. The value at [esp+4] is the window handle, something that can be recovered from the this pointer, so it's a handly thing to replace with this before jumping to the real C++ function.

The address being stored as the this parameter is 68fbba44, which is inside the DLL in question. (You can tell this because the return address, which points to the ATL thunk code, is at 68f79ca6 which is in the same neighborhood as the mystery pointer.) Therefore, this is almost certainly an ATL thunk for a static C++ object.

In other words, this is extremely unlikely be a jump to a random address. The code at the address looks too good. It's probably jumping there intentionally, and the fact that it's coming from a window procedure thunk confirms it.

But our tale is not over yet. The plot thickens. We'll continue next time.

Comments (43)
  1. Bryan says:

    Here’s to looking forward to your next installment of this series.

  2. Greg D says:

    This is the kind of stuff that they never taught you in school. . .

    I like to think that I’d have reached the same conclusion, but I imagine it would have taken me somewhat longer to do so.

  3. RUF says:

    For the next episode i would like to know your opinion about that kind of hacks in the code

  4. JS Bangs says:

    This is one of those posts that I initially read with total bewilderment, having no idea what DEP and IP_ON_HEAP meant. I was about to move on to something else when my brain suddenly connected DEP to Data Execution Prevention… which means that IP_ON_HEAP must be Instruction Pointer On Heap. And now the whole article makes sense.

    Maybe this will help someone else who doesn’t use those abbreviations on a regular basis.

  5. Skywing says:

    Hmm – it’s my understanding that by default, there is emulation support for old style atlthunks enabled ( http://msdn2.microsoft.com/en-us/library/bb736299.aspx ).  I do recall something about it being inadvertently disabled in Vista RTM, though.

    Did this predate the atlthunk emulation hack, or was the system booted with /noexecute=alwayson ?

  6. Keithius says:

    -> JS Bangs

    Yes, that does help. I was similarly mystified until you made the connection from DEP to Data Execution Prevention. Thank you!

  7. Erik says:

    “You should be able to determine the cause instantly.”

    OK, smartypants.  I realize this is a Win32 blog and therefore covers many topics beyond my knowledge as a .NET programmer.  However, statements like that turn me off.  I like to read your blog to learn about the inner workings of the Windows platform.  I could do without the cockiness.

    [I think knowing the reasons for IP being on the heap is one of the basic things you need to know when you’re doing any unmanaged programming (not just Win32). You need to know how the CPU works because in the unmanaged world, there’s nobody between you and the CPU. -Raymond]
  8. codekaizen says:

    Whoa, Erik –

    Consider dialing back the edginess a bit. When we only get text, the usual side-band channels of information, like tone, facial-expression and shared setting, are missing. You have to fill them in. I’ve found it more pleasant to imagine these in a way which conduces to my continual learning and improvement.

    I, for one, enjoy a good didactic tone when the subject is obviously mastered by the author. It’s just a rhetorical device, and to me it communicates an encouraging poke to wake up and think. When mixed with the right amount of caffeine, it does a fair bit to shift me into high gear in the morning. I started only knowing .Net, too, but after a few years of interpreting Raymond as expecting me to know more, I now do, and am better for it.

  9. brian says:

    "I, for one, enjoy a good didactic tone when the subject is obviously mastered by the author. It’s just a rhetorical device, and to me it communicates an encouraging poke to wake up and think."

    I second that. As a .net programmer that only briefly has to touch win32, I read the blog to learn. When Raymond says "It Should Be obvious"  it’s a challenge.  It says to me The Answer is on the page and i ask myself "Can I find it?"

  10. Matt Green says:

    It’s a damn shame that we need to resort to hacks like this in order to properly wrap Win32. Thankfully, almost all of the newer APIs have a context parameter that can be associated with objects.

    How does ATL8 fix this, anyway?

  11. Tim Smith says:

    ALT8 fixes the problem by using PAGE_EXECUTE_READWRITE when allocating the thunk block with VirtualAlloc.

  12. AsmGuru62 says:

    There is no need to wrap WndProc – can be done perfectly with GetWindowLong() and then calling virtual message routing method.

  13. Matt Green says:

    I’m fairly certain that some messages are missed with that method, AsmGuru62. Specifically, the pre-create messages.

    [Windows doesn’t have any “pre-create” messages. How could it? How can you deliver a message to a window that doesn’t exist yet? Maybe you’re thinking of something else. -Raymond]
  14. Ugh says:

    "ALT8 fixes the problem by using PAGE_EXECUTE_READWRITE when allocating the thunk block with VirtualAlloc."

    Eeeek. Does that mean a heap overwrite or buffer overflow into the thunk block could result in "remote code execution"? Which is what DEP is supposed to protect against?

  15. name required says:

    “There is no need to wrap WndProc – can be done perfectly with GetWindowLong() and then calling virtual message routing method.”

    Except when you need to wrap the window which is based on existing class, using GetWindowLong slots already.

    [Isn’t that why C++ has derived classes? If you’re talking about Win32 subclassing, then there’s GetProp. -Raymond]
  16. Erik Madsen says:

    “Consider dialing back the edginess a bit.”

    Mine was simply a comment on tone.  The statement “This was totally obvious to me, but the person who asked the question met it with stunned amazement” seemed gratuitous to me.  It serves only to show Raymond’s superiority over his colleague.  Why do I need to know this?  Whereas I do need to know the reasons for IP being on the heap, as Raymond pointed out in his response.

    All things considered, I do enjoy reading this blog.  I feel I learn something each morning.

    [It really should be obvious to an experienced programmer. And I’m assuming an audience of experienced programmers. It says right there what the problem is: IP_ON_HEAP. And it’s clear that it was on purpose rather than accidental once you look at the stack trace. -Raymond]
  17. Evan says:

    I got to the point reading "This was totally obvious to me, but the person who asked the question met it with stunned amazement," and thought for sure that it was going to continue with the asking person getting all indignant that you figured out what they were using, that you were reverse engineering their program and intellectual property, etc.

    (I’ve never done any ATL programming, and didn’t know what IP_ON_HEAP meant until this sentence "The fault is IP_ON_HEAP which is precisely what DEP protects against.")

  18. mh says:

    I see no issue.  I didn’t know it either, Raymond felt that I should, so now I do.

    Result.

  19. Andy says:

    It’s posts like this that are why I love to read this blog so much. Bits of info that you could get no where else in an almost tutorial like form but with more personality than a regular tutorial. I can hardly wait for the second half.

  20. name required says:

    I was referring to Win32 class, and to AsmGuru62’s suggestion to use GetWindowLong to avoid thunking.

  21. movl says:

    Evan: seconded, that’s what gave it all away. But especially the registers dump is pretty baffling.

  22. Neil says:

    [Windows doesn’t have any "pre-create" messages.]

    I think he’s referring to those messages sent before CreateWindow returns, which won’t get subclassed if you SetWindowLongPtr afterwards.

    Presumably the thunks avoid having to store pThis as window words or properties?

  23. Worf says:

    Amazingly, I guessed what the issue was, and was half-right (I figured DEP easily enough – Windows CE’s x86 Emulator is incompatible with DEP on a multiprocessor system (but OK on single processor). IP_ON_HEAP had me scratching my head (IP address? Wha?), until I realized it meant "Instruction Pointer" (to which I usually call PC – program counter).

    Alas, I do not know ATL, so that baffled me.

    All you .NET developers – I’m on the other side of the fence. I know Win32, but the C stuff only. ATL, MFC, COM, OLE, etc baffle me as well. (Last time I coded a GUI, it was all done using CreateWindow – no resource files, no resource editor or other fancy goodies.)

    But I enjoy this blog, for it appeals to me – the low-level stuff of Windows. And, no matter how far away you go, sometimes you end up in the muck (like this ATL example – it helps to understand what’s really happening behind that pretty face).

  24. Frederik Slijkerman says:

    By the way, is a thunk much faster than retrieving the ‘this’ pointer with GetWindowLongPtr(hwnd, GWLP_USERDATA) ?  (Assuming that you have stored that previously.)

    Or are there other benefits? Because the GWLP method seems to be much easier to me…

  25. Jonathan says:

    I, too, thought about IP address, and expected a pshychic "0x0100a8c0 = 192.168.0.1" or something.

    BTW, identification of this could’ve been easily added to the debugger – "if IP_ON_HEAP and the stack looks like X then say something about ATL". That would’ve moved this class of incidents from "psychic debugging by gurus" to a simple "read the helpful debugger output". I think Application Verifier had somehting like that.

  26. OP: “Since C++ functions have a hidden this parameter”

    Umm, not all of them. Just the class methods.

    [I bet you miss the nitpicker’s corner. -Raymond]
  27. Dean Harding says:

    Roman Werpachowski: You must be new here…

    No wait, it’s probably me that’s new here, if I didn’t expect a comment like that…

  28. Name required says:

    Well you know what, in a dozen years of C++ Windows programming the total number of times I’ve needed to know the reasons for IP being on the heap is exactly:

    [Wow, you’ve never invoked a virtual method on an object that has already been deleted. I wish I were as awesome as you. -Raymond]
  29. ace says:

    I know how CPU and messages in Windows work, but I’d still be grateful if somebody can explain: why is the thunking in ATL necessary — which problem is actually solved by it, that can’t be solved otherwise?

  30. Why did that IP_ON_HEAP problem show up all of a sudden?

  31. nick says:

    guys, consider the performance.

    of course, you can store the "this" pointer in window user data. but that end up with calling GetWindowLong() repeatedly. don’t forget the thousands of message a window proc would process.

    there’s a wonderful article on this topic in codeproject, but I forget where it is.

    raymond, could you tell more about thunk emulation in the next post of the series?

    I have ever come across a bug.

    the program is based on ATL3. it does run in a DEP enabled PC, the main window shows up. but when calling a function in a dll which creates a new window, the program crashes at that moment.

    the main window created successfully is because, i suppose, the thunk emulation. what confused me is why the window created in DLL is out of luck? or did i miss anything?

  32. AsmGuru62 says:

    GetWindowLong() should be rather quick – it is just an accessing the data via offset, of course HWND must be also verified… still it should be OK to call it every time WndProc is entered. I have it in my library for a lot of years and I did not see any slowdowns.

  33. Matt Green says:

    Yeah I was thinking of WM_NCCREATE (I always think of it as the “pre-WM_CREATE” message), and I don’t think the GetWindowLong() method works for those. Of course, you don’t need to use WM_NCCREATE too often.

    [Naturally, the first message does the SetWindowLong. Somebody has to do the Set after all. -Raymond]
  34. Ray wrote a post entitled " Psychic Debuggin: IP on heap ", where he talks about somebody being amazed

  35. Amateur says:

    Can someone make a summary what ATL is? I’m not a programmer, but encounter this very often.

    [Don’t be helpless. Live Search. Google. Yahoo. All of them answer your question on the first page of results. I write for an advanced audience. (Whether I actually have an advanced audience is open to debate.) If you’re a beginner, you may want to unsubscribe and switch to a blog that’s more suited to your experience. Don’t worry. When you become an advanced programmer, this article will still be here for you. -Raymond]
  36. AsmGuru62 says:

    I pass ‘this’ with WM_CREATE (lParam points to it in its first DWORD). Not sure what can be done with WM_NCCREATE, which cannot be done with WM_CREATE…

  37. olde farthe says:

    "Whether I actually have an advanced audience is open to debate."

    I wish I was as famous as Raymond, so that I could have a blog, insult my readers, and get away with it.

  38. Dave says:

    Personally, I love the tone here. This blog makes me smile almost every day, and that’s worth the price of admission by itself.

    Sarcasm and snarky comments wake up my brain and make the post memorable, improving my chances of remembering the technical details as well.

  39. MadQ says:

    @Matt Green, AsmGuru62: for overlapped windows, the very first message is actually WM_GETMINMAXINFO. Actually, I think it might be for all windows that don’t have the WS_CHILD style, but I’m too lazy right now to write something to find out for sure.

  40. Name required says:

    "I wish I were as awesome as you"

    Don’t clap, just throw money.

  41. Sidebar Willy says:

    "I pass ‘this’ with WM_CREATE (lParam points to it in its first DWORD). Not sure what can be done with WM_NCCREATE, which cannot be done with WM_CREATE…"

    The old Scratch program uses WM_NCCREATE: http://blogs.msdn.com/oldnewthing/archive/2005/04/22/410773.aspx

  42. Yesterday I had an interesting case that I thought of sharing, even though there&#39;s nothing very new

Comments are closed.