When you transfer control across stack frames, all the frames in between need to be in on the joke


Chris Hill suggests discussing the use of structured exception handling as it relates to the window manager, and specifically the implications for applications which raise exceptions from a callback.

If you plan on raising an exception and handling it in a function higher up the stack, all the stack frames in between need to be be in on your little scheme, because they need to be able to unwind. (And I don't mean "unwind" in the "have a beer and watch some football" sense of "unwind".)

If you wrote all the code in between the point the exception is raised and the point it is handled, then you're in good shape, because at least then you have a chance of making sure they all unwind properly. This means either using RAII techniques (and possibly compiling with the /EHa flag to convert asynchronous exceptions to synchronous ones, so that Win32 exceptions will also trigger unwind; although that has its own problems since the C++ exception model is synchronous, not asynchronous) or judiciously using try/finally (or whatever equivalent exists in your programming language of choice) to clean up resources in the event of an unwind.

But if you don't control all the frames in between, then you can't really guarantee that they were written in the style you want.

In Win32, exceptions are considered to be horrific situations that usually indicate some sort of fatal error. There may be some select cases where exceptions can be handled, but those are more the unusual cases than the rule. Most of the time, an exception means that something terrible has happened and you're out of luck. The best you can hope for at this point is a controlled crash landing.

As a result of this overall mindset, Win32 code doesn't worry too much about recovering from exceptions. If an exception happens, then it means your process is already toast and there's no point trying to fix it, because that would be trying to reason about a total breakdown of normal functioning. As a general rule generic Win32 code is not exception-safe.

Consider a function like this:

struct BLORP
{
    int Type;
    int Count;
    int Data;
};

CRITICAL_SECTION g_csGlobal; // assume somebody initialized this
BLORP g_Blorp; // protected by g_csGlobal

void SetCurrentBlorp(const BLORP *pBlorp)
{
    EnterCriticalSection(&g_csGlobal);
    g_Blorp = *pBlorp;
    LeaveCriticalSection(&g_csGlobal);
}

void GetCurrentBlorp(BLORP *pBlorp)
{
    EnterCriticalSection(&g_csGlobal);
    *pBlorp = g_Blorp;
    LeaveCriticalSection(&g_csGlobal);
}

These are perfectly fine-looking functions from a traditional Win32 standpoint. They take a critical section, copy some data, and leave the critical section. The only thing¹ that could go wrong is that the caller passed a bad pointer. In the case of Terminate­Thread, we're already in the world of "don't do that"

If that happens, a STATUS_ACCESS_VIOLATION exception is raised, and the application dies.

But what if your program decides to handle the access violation? Maybe pBlorp points into a memory-mapped file, and there is an I/O error paging the memory in, say because it's a file on the network and there was a network hiccup. Now you have two problems: The critical section is orphaned, and the data is only partially copied. (The partial-copy case happens if the pBlorp points to a BLORP that straddles a page boundary, where the first page is valid but the second page isn't.) Just converting this code to RAII solves the first problem, but it doesn't solve the second, which is kind of bad because the second problem is what the critical section was trying to prevent from happening in the first place!

void SetCurrentBlorp(const BLORP *pBlorp)
{
    CriticalSectionLock lock(&g_csGlobal);
    g_Blorp = *pBlorp;
}

void GetCurrentBlorp(BLORP *pBlorp)
{
    CriticalSectionLock lock(&g_csGlobal);
    *pBlorp = g_Blorp;
}

Suppose somebody calls Set­Current­Blorp with a BLORP whose Type and Count are in readable memory, but whose Data is not. The code enters the critical section, copies the Type and Count, but crashes when it tries to copy the Data, resulting in a STATUS_ACCESS_VIOLATION exception. Now suppose that somebody unwisely decides to handle this exception. The RAII code releases the critical section (assuming that you compiled with /EHa), but there's no code to try to patch up the now-corrupted g_Blorp. Since the critical section was probably added to prevent g_Blorp from getting corrupted, the result is that the thing you tried to protect against ended up happening anyway.

Okay, that was a bit of a digression. The point is that unless everybody between the point the exception is raised and the point the exception is handled is in on the joke, you are unlikely to escape fully unscathed. This is particular true in the generalized Win32 case, since it is perfectly legal to write Win32 code in languages other than C++, as long as you adhere to the Win32 ABI. (I'm led to believe that Visual Basic is still a popular language.)

There are a lot of ways of getting stack frames beyond your control between the point the exception is raised and the point it is handled. For example, you might call Enum­Windows and raise an exception in the callback function and try to catch it in the caller. Or you might raise an exception in a window procedure and try to catch it in your message loop. Or you might try to longjmp out of a window procedure. All of these end up raising an exception and catching it in another frame. And since you don't control all the frames in between, you can't guarantee that they are all prepared to resume execution in the face of an exception.

Bonus reading: My colleague Paul Betts has written up a rather detailed study of one particular instance of this phenomenon.

¹Okay, another thing that could go wrong is that somebody calls Terminate­Thread on the thread, but whoever did that knew they were corrupting the process.

Comments (13)
  1. Antonio Rodríguez says:

    Corollary: always handle all reasonable exceptions in a callback. Yes, there are still programmers using Visual Basic 6 (its reasonably powerful and quite fast to develop in, and believe it or not, many customers still require Windows 98 compatibility in 2012!), and this is one thing you have to take in mind when you need to subclass (one of those hairy and ugly things in a high level language: subclassing is far more difficult in classic Visual Basic than in C or C++). Inside of a window procedure or a callback, any kind of exception immediately crashes the entire process, and Visual Basic raises exceptions ("runtime errors") for any unexpected condition, like reading beyond the end of a file.

    All in all, most of the time you don't need to work with subclassing, and classic Visual Basic is pretty efficient for general GUI programming – that's why it's still used 15 years after its last release!

  2. Joshua says:

    Which is why I normally catch at module boundaries, stuff into global variable translate to return code, return, and rethrow at the reentry point.

  3. An aviation person once told me that all landings are controlled crashes.

  4. alegr1 says:

    A checked build of Windows needs to have all those callback calls wrapped in a catch-all block which must terminate the process with a DebugOutput message. But then again, who runs the checked Windows? Even Microsoft proggers don't always care.

  5. Neil says:

    The worst bit is the that 64-bit message loop effectively eats window procedure exceptions, so for example if your paint routine throws an exception, it actually exhibits as a busy hang as your app keeps failing to repaint.

  6. Neil says:

    My above comment was slightly inaccurate. Correcting it is left as an exercise. Now to try out the hotfix mentioned in the linked article.

  7. Joshua says:

    After reading the entire backwards chain, I conclude that EnterCriticalSection has a bug. So, it throws if it needs to allocate an event but cannot. Even in that situation, it should not corrupt state.

    I mean really, the APIs in kernel32 are the effective kernel level APIs. They need to be as stable as kernel, by which I mean all conditions except for memory corruption are handled.

  8. alegr1 says:

    "The worst bit is the that 64-bit message loop effectively eats window procedure exceptions, so for example if your paint routine throws an exception, it actually exhibits as a busy hang as your app keeps failing to repaint."

    It should be obvious that WM_PAINT handler MUST NOT create any modal dialogs and MUST NOT throw before calling BeginPaint.

  9. Gabe says:

    Joshua: As of Win2k, EnterCriticalSection doesn't throw an exception upon being unable to allocate an event. Instead it just uses the preallocated KeyedEvent. It will only throw an exception if your critical section times out, but that causes nothing to get corrupted.

    The corruption happens when your region of code protected by the critical section throws an exception you're not expecting (ACCESS_VIOLATION). The exception handling of RAII exits the critical section, leaving your data structure half-modified and no longer protected by the critical section. It's this partially modified data structure that is the corruption.

  10. Veltas says:

    "When you transfer control across stack frames, all the frames in between need to be in on the joke" Haha!

    Interesting read, especially since I tend not to involve myself with exceptions unless they involve themselves with me.

  11. Anonymous says:

    We had an issue like this at a place I used to work.  We had a product which was built on Berkeley DB.  In some rare and generally unpredictable situation, the DB would apparently decide to corrupt itself, and every call would return a "DB run recovery" error.  Even running recovery, like it said, wouldn't fix it.  Eventually, we tracked the cause down to a C++ exception being thrown out through the key comparison callback function (Berkeley DB is a C library).  Apparently ripping the stack out from under BDB while it was in the middle of some B-Tree operation was a Very Bad Thing…

  12. Bob says:

    "An aviation person once told me that all landings are controlled crashes."

    Only if the pilot is an ex-naval aviator.

  13. Joshua says:

    @Anonymous: Undefined or not I'd suddenly consider abandoning BerkeleyDB. I find it hard to imagine that such behavior could coexist with able to recover from pulling the plug at any time.

Comments are closed.

Skip to main content