Who is this rogue operative that filled my object with 0xDD, then sent me a message?


A failure occurred during stress testing, and the component team came to the conclusion that their component was actually the victim of memory corruption and they asked for help trying to see if there was anything still in memory that would give a clue who did the corrupting.

/* static */ LRESULT CALLBACK CContoso::WndProc(
    HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
    CContoso *pThis = reinterpret_cast<CContoso *>
                           (GetWindowLongPtr(hwnd, GWLP_USERDATA));
    ...
    pThis->... // crash on first dereference of pThis

According to the debugger, the value of pThis is a valid pointer to memory that is complete nonsense.

0: kd> dv
           hwnd = 0xf0040162
           uMsg = 0x219
           ...
          pThis = 0x10938bf0
           ...
0: kd> dt pThis
Type CContoso*
   +0x000 __VFN_table : 0xdddddddd 
   +0x004 m_cRef           : 0n-572662307
   +0x008 m_hwnd           : 0xdddddddd HWND__
   ...

The CContoso object was filled with the byte 0xDD. Who would do such a thing?

There are a few clues so far, and if you're psychic, you may have picked up on their aura.

But I had a suspicion what happened, so I dug straight into the code to check my theory.

BOOL CContoso::StartStuffInBackground()
{
 AddRef(); // DoBackgroundWork will release the reference
 BOOL fSuccess = QueueUserWorkItem(&CContoso::DoBackgroundWork, this, 0);
 if (!fSuccess) Release();
 return fSuccess;
}

/* static */ DWORD CALLBACK CContoso::DoBackgroundWork(void *lpParameter)
{
 CContoso *pThis = static_cast<CContoso *>(lpParameter);
 pThis->DoThis();
 pThis->DoThat();
 pThis->Release();
 return 0;
}

So far, we have a standard pattern. An extra reference to the object is kept alive as long as the background thread is still running. This prevents the object from being destroyed prematurely.

(Note that this object is not necessarily a COM object. It could be a plain object that happens to have chosen the names Add­Ref and Release for the methods that manipulate the reference count.)

What people often forget to consider is that this means that the final release of the CContoso object can occur on the background thread. I mean, this is obvious in one sense, because they are adding the extra reference specifically to handle the case where we want to delay object destruction until the background thread completes. But what happens if that scenario actually comes to pass?

CContoso::~CContoso()
{
 if (m_hwnd != nullptr) DestroyWindow(m_hwnd);
 ...
}

As part of the destruction of the CContoso object, it destroys its window. But Destroy­Window must be called on the same thread which created the window: "A thread cannot use Destroy­Window to destroy a window created by a different thread."

This means that if the final release of the CContoso object comes from the background thread, the destructor will run on the background thread, and the destructor will try to destroy the window, but the call will fail because it is on the wrong thread.

The result is that the object is destroyed, but the window still hangs around, and the window has a (now dangling) pointer to the object that no longer exists.

Since the window in question was a hidden helper window, the program managed to survive like this for quite some time: Since the program thought the window was destroyed, there was no code that tried to send it a message, and the normal system-generated messages were not anything the object cared about, so they all fell through to Def­Window­Proc and nobody got hurt. But eventually, some other stress test running on the machine happened coincidentally to broadcast the WM_SETTING­CHANGE message 0x0219, and when the object tried to check what settings changed, that's when it crashed. (That was one of the clues I hinted at above: The message that triggered the crash is 0x0219. This is a good number to memorize if you spend time studying stress failures because it is often the trigger for crashes like this where a window has been orphaned by its underlying object.)

The root cause is that the object was treated as a free-threaded object even though it actually had thread affinity.

One way to fix this is to isolate the parts with thread affinity so that they are used only on the UI thread. The one we identified is the destructor due to its use of Destroy­Window. So at a minimum, we could marshal destruction to the UI thread.

LONG CContoso::Release()
{
 LONG cRef = InterlockedDecrement(&this->m_cRef);
 if (cRef == 0)
 {
  if (m_hwnd == nullptr) {
    delete this;
  } else {
    PostMessage(m_hwnd, CWM_DESTROYTHIS, 0, 0);
  }
 }
 return cRef;
}

/* static */ LRESULT CALLBACK CContoso::WndProc(
    HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
    CContoso *pThis = reinterpret_cast<CContoso *>
                           (GetWindowLongPtr(hwnd, GWLP_USERDATA));
    ...

    case CWM_DESTROYTHIS:
     delete pThis;
     return 0;
    ...

(The original code had better have been using an interlocked operation on Release because it was releasing from a background thread already.)

If the final Release happens before we have a window, then we just destruct in-place, on the theory that if no window is created, then we are being destroyed due to failed initialization and are still on the original thread. Otherwise, we post a message to the window to ask it to destroy the object.

Note that this design does have its own caveats:

  • Even if the final Release happens on the UI thread, we still post a message, even though we could have destroyed it inline.

  • Posting a message assumes that the message pump will continue to run after the object is released. If somebody releases the object and then immediately exits the thread, the posted message will never arrive and the object will be leaked.

  • Posting a message makes destruction asynchronous. There may be some assumptions that destruction is synchronous with the final release.

As for the first problem, we could do a thread check and destruct in-place if we are on the UI thread. This would most likely solve the second problem because the exiting thread is not the one that will process the message. It will still be a problem if the background thread does something like

  Release();
  DoSomethingThatCausesTheUIThreadToExitImmediately();

For the second problem, we could change the Post­Message to a Send­Message, but this creates its own problems because of the risk of deadlock. If the UI thread is blocked waiting for the background thread, and the background thread tries to send the UI thread a message, the two threads end up waiting for each other and nobody makes any progress. On the other hand, making the destruction synchronous would fix the third problem.

Another approach is to push the affinity out one more step:

/* static */ DWORD CALLBACK CContoso::DoBackgroundWork(void *lpParameter)
{
 CContoso *pThis = static_cast<CContoso *>(lpParameter);
 pThis->DoThis();
 pThis->DoThat();
 pThis->AsyncRelease();
 return 0;
}

void CContoso::AsyncRelease()
{
 PostMessage(m_hwnd, CWM_RELEASE, 0, 0);
}

/* static */ LRESULT CALLBACK CContoso::WndProc(
    HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
    CContoso *pThis = reinterpret_cast<CContoso *>
                           (GetWindowLongPtr(hwnd, GWLP_USERDATA));
    ...
    case CWM_RELEASE:
     pThis->Release();
     return 0;

    ...

In this design, we make the asynchronicity explicit in the name of the function, and we require all background threads to use the asynchronous version. This design again assumes that the only reason the window wouldn't exist is that something went wrong during initialization before any background tasks were created.

Unfortunately, this design also retains the original constraint that Release can be called only from the UI thread. That makes the object rather fragile, because it is not obvious that Release has such constraints. If you go this route, you probably should rename Release to Release­From­UI­Thread.

If this object is a COM object, then another option is to use COM marshaling to marshal the IUnknown to the background thread and use IUnknown::Release to release the object. Since you used COM to marshal the object, it knows that Co­Uninitialize should wait for all outstanding references marshaled to other threads, thereby avoiding the "lost message" problem.

Anyway, those are a few ideas for addressing this problem. None of them are particularly beautiful, though. Maybe you can come up with something better.

(The component team fixed this problem by taking advantage of a detail in the usage pattern of the CContoso object: The client of the CContoso object is expected to call CContoso::Stop before destroying the object, and after calling CContoso::Stop, the only valid thing you can do with the object is destroy it. Furthermore, that call to CContoso::Stop must occur on the UI thread. Therefore, they moved the part of the cleanup code that must run on the UI thread into the Stop method. The object's background tasks already knew to abandon work once they detected that the object had been stopped.)

Comments (21)
  1. Joshua says:

    PostMessage(hand, WM_CLOSE, 0, 0)

    [This assumes the class did not override the WM_CLOSE message to display a "You have unsaved changes." prompt or other shenanigans. -Raymond]
  2. Joshua says:

    [This assumes the class did not override the WM_CLOSE message to display a "You have unsaved changes." prompt or other shenanigans. -Raymond]

    Oh right I put that code in WM_SYSCOMAND (SC_CLOSE) now for distinguishing between interactive and non-interactive close.

  3. Darran Rowe says:

    This makes me wonder. Besides the mass breakage that would occur now if it was tried, would making more of the UI functions enforce thread affinity have helped any.

    Even though we are supposed to know that UI functions have thread affinity, it is easy enough to forget that they do, and new programmers often don't even learn this.

    But I guess that is up for speculation and the only way to test this out would not only require a time machine, but the ability to create an alternate universe too.

    [Some of them do enforce thread affinity. For example, DestroyWindow fails if you call it from the wrong thread. That's what got us into this mess in the first place. -Raymond]
  4. Sven2 says:

    The mention of the "hidden helper window" makes me wonder why so many functions in Windows are tied to window handles. For example, couldn't message queues be separate form windows? Each window would have a message queue, but message queues would have their own handle and could also exist without a window.

    If fewer functions were tied to window handles, you might need fewer "hidden helper windows" in the first place.

    [You can have a message queue without a window. It's called a thread. See PostThreadMessage. But watch out for modal loops. That's why people use helper windows. -Raymond]
  5. Gabe says:

    Sven2: What's the difference between a message queue and a hidden window?

    A window is nothing more than a message queue that has some extra metadata that allows it to do some drawing. If you never use that extra metadata (your "window" is "hidden"), then it's just a message queue with a misleading name.

  6. AsmGuru62 says:

    I wonder if EndDialog() can be called from a different thread, that opened a modal dialog box.

  7. Peter says:

    I'm not sure what to think of the component team's solution.  If I forget to call Stop() before releasing the object, it could cause my program to crash?  But only once in a blue moon?  Seems poor.

    My approach would be not to let the worker thread hold a reference to the object at all.  Instead, I would make CContoso's destructor do something like this:

    CContoso::~CContoso()

    {

    if (m_hwnd != nullptr) DestroyWindow(m_hwnd);

    ...

    if (WorkerIsRunning())

    {

       SignalWorkerToExit();

       WaitForWorker();

    }

    }

    [This could deadlock if the worker thread is trying to talk to the main thread. (Which it definitely will do if it tries to send a message.) It also means that you have an object with reference count zero. -Raymond]
  8. mikeb says:

    Since it's never specifically called out in the article and I think it's valuable for every Windows developer to know, the answer to the question "who filled my object with 0xDD?" is: the debug runtime heap did when the object was freed. So in any debug session that you see an object filled with 0xDD, your first instinct should be to say to yourself, "I'm trying to use an object that has already been freed".  

    There are a number of different memory fill values that can be useful to know about when debugging and you encounter variables/objects that hold unusual values. See stackoverflow.com/.../12711

  9. Peter says:

    [This could deadlock if the worker thread is trying to talk to the main thread. (Which it definitely will do if it tries to send a message.) It also means that you have an object with reference count zero. -Raymond]

    Yes, that's certainly a possibility.  It really comes down to what "DoThis()" and "DoThat()" actually do (which you haven't shown).  But I think the approach is a good one in general, and it's still what I'd do unless there was prohibitive amounts of refactoring involved.

  10. Joshua says:

    BTW, 0xDDDD disassembles to fstp st5 so it's probably not a NOP sled.

  11. Henri Hein says:

    I don't like reference counting in general, and I would certainly never voluntarily use it on an object that owns any part of the UI, or a helper window.  This kind of problem is exactly why.

    Another rule of thumb for me is that when I use GWLP_USERDATA to attach an object to a window, I always null it out when I destroy the object (actually before).  My WndProc is always written so that when the GWLP_USERDATA returns NULL, fall through to DefWindowProc.  In this case, that rule would have fixed (or prevented) the crash, but it would not have fixed the inert DestroyWindow() call.

  12. Gabe says:

    mikeb: A pointer to an object filled with 0xDD could be an object that was freed -- or it could be an uninitialized/corrupt variable pointing to random memory.

    Yet another possibility is a double-free bug. You have a pointer to an object 1, you free that object, then the next allocation gets a pointer to the same memory and initializes it, but later somebody else tries to free object 1. This yields a live object that was overwritten rather than a destructed object that was freed.

  13. Darran Rowe says:

    [Some of them do enforce thread affinity. For example, DestroyWindow fails if you call it from the wrong thread. That's what got us into this mess in the first place. -Raymond]

    Yup, but I was thinking more functions checking, like the ones that currently don't.

    But well, it was more of a thought exercise along the lines of, if the UI functions consistently checked for the correct thread, would we see less issues with people modifying the UI state across threads.

    The thought was if they had the constant reminder of the UI function failing, then it would hopefully follow that modifying other state, like destroying objects, or closing a window by posting a message rather than calling a member function, would decrease.

  14. Matt says:

    For the various Window things that require they start on the right thread, why doesn't Windows actually *check* and abort if they don't?

  15. manuell says:

    If you use PostMessage for resource management, don't forget to use PeekMessage after window destruction, to check if there is still some work to be done.

  16. Mike Dimmick says:

    @Gabe: The debug heap initializes blocks to 0xCC on allocation.

    @Joshua: Choosing 0xCC for initialization is indeed because 0xCC is int 3, the breakpoint instruction. Whoever chose 0xDD wasn't really considering attempts to execute code from the buffer.

    [NX means that code execution in the heap is most likely going to be caught some other way. -Raymond]
  17. laonianren says:

    Sven2 wrote "each window would have a message queue, but message queues would have their own handle and could also exist without a window."

    If you ignore the names given to things then this is pretty much what Windows does already.  You create a message queue by calling CreateWindowEx and specifying HWND_MESSAGE as the parent window.  You create a message queue and attach a window to it by calling CreateWindowEx with some other parent window.

    msdn.microsoft.com/.../ms632599%28v=vs.85%29.aspx

  18. Sven2 says:

    laonianren: Cool, thanks, I didn't know that.

  19. Alex Cohn says:

    I believe the problem would be much worse with release runtime libs, which does not setmem(0xDD), so that broadcast message would encounter an almost correct object.

  20. John Doe says:

    You shouldn't confuse a window with a queue, that's vile!

    At most, you can consider a window as a stream processor (e.g. the window procedure of a HWND_MESSAGE descendant), but the queue is still tied to the thread and it requires pumping.  If you don't pump it, no window will process posted messages, or messages sent from other threads.

  21. Andrei says:

    Why isn't there a special setting in App Verifier for this kind of errors? Like DestroyWindow on a different thread?

Comments are closed.

Skip to main content