When I intentionally create a stack overflow with SendMessage, why do I sometimes not get a stack overflow?


Take our scratch program and make these changes:

LRESULT CALLBACK
WndProc(HWND hwnd, UINT uiMsg, WPARAM wParam, LPARAM lParam)
{
    switch (uiMsg) {
    ...
    case WM_APP: return SendMessage(hwnd, WM_APP, 0, 0) + 1;
    }

    return DefWindowProc(hwnd, uiMsg, wParam, lParam);
}

int WINAPI WinMain(HINSTANCE hinst, HINSTANCE hinstPrev,
                   LPSTR lpCmdLine, int nShowCmd)
{
    ...
        ShowWindow(hwnd, nShowCmd);

        SendMessage(hwnd, WM_APP, 0, 0);
        MessageBox(hwnd, TEXT("Will this appear?"), TEXT("Hey"), MB_OK);

        while (GetMessage(&msg, NULL, 0, 0)) {
            ...
        }

        ...
}

This program enters infinite recursion upon receipt of the WM_APP message by sending itself another copy of the same message. We add one to the returned value to make sure there's no tail call elimination.

After the program creates the window, it starts the death spiral by sending the first WM_APP message. And then after the deadly message, it displays a message.

The question: What happens? Do you see the message?

When you run the program, you might see the message, or you might crash. It depends on which resource runs out first: The user-mode stack or the kernel-mode stack.

Sending a message consumes some stack in user-mode, because you're calling the Send­Message function, and that does some work in user mode before calling into kernel mode to do the actual message sending.

Upon entry into kernel mode, some more stack is consumed to do the kernel-mode processing, like looking up which thread should be chosen to handle the message and dispatching any active window hooks. In this case, the destination window belongs to the same thread, so the kernel simulates a call into user mode and transitions back to user mode at a function whose job it is to call the window procedure.

Now we're back in user mode, and the helper function calls the window procedure, which allocates a stack frame, detects that the message is WM_APP, and calls Send­Message, and the process repeats.

When does this end?

On entry to kernel mode, the kernel checks how much kernel-mode stack is still available, and if there isn't enough to send a message, then the kernel says, "Whoa, I'm not going to be able to do this Send­Message thing, so I'm going to fail the call." The kernel-mode portion of the Send­Message function returns zero, and that causes Send­Message to return zero instead of sending the message.

On the other hand, there is no such preflight check in user mode. User mode just keeps running until it runs out of stack, at which point a STATUS_STACK_OVERFLOW exception is raised. Assuming nobody handles this exception, the program crashes.

It comes down to a race to see which limited resource runs out first. If the kernel-mode stack runs out first, then one of the Send­Message calls will return zero without actually sending the message, at which point the entire call stack unwinds, and then execution resumes at the call to Message­Box. On the other hand, if the user-mode stack runs out first, then you get a stack overflow exception.

Your program has some control over how much user-mode stack is consumed at each recursion, because some of that stack usage comes from your own window procedure. So if you keep your window procedure's stack usage low, then you're more likely to run out of kernel mode stack first.

But wait, there's still another wrinkle that introduces a level of unpredictability to the calculations. We'll dig deeper next time.

Comments (20)
  1. ranta says:

    That sounds dangerous. I expected the kernel-mode portion to release all of the kernel stack for the duration of the user-mode callback, and if it needs to preserve some information across the callback, then allocate a handle for that information and put the handle on the user-mode stack.

    What if the application recurses until SendMessage notices there is not enough kernel stack available and returns 0, and then the application starts trying other system calls? I imagine the system-call dispatcher must always check the kernel stack limits, not only for SendMessage. However, it might be difficult to know in advance how much kernel stack will be needed, especially for I/O that can go through many filter device objects. So, Windows presumably has a conservative estimate for that. What is a good way for a driver developer to test whether his or her driver can tolerate such a depleted kernel stack?

    1. Gee Law says:

      For your first paragraph, if the kernel put the handle on the user-mode stack, user-mode application could leak the handle, or even worse, trick the kernel into later de-allocating another resource by overwriting it, which would almost surely result a user-mode initiated BSoD, a security hole.

      For the second paragraph, I wouldn’t be surprised if every (good) driver checks the stack limit before actually anything. Otherwise, user-mode initiated BSoD.

    2. kc0rtez says:

      ranta, doesn’t even make sense trying to “tolerate” a depleted kernel stack, if things get to this point you have much bigger issues to deal with.

    3. kantos says:

      Reasonably speaking if you’re in a situation where you’ve run out of kernel stack you’re doing something VERY VERY wrong as almost all the patterns in windows programming encourage an asynchronous style that keeps thread stacks usually fairly small. SendMessage is one of the few places that at least I would assume currently this sort of thing is possible. But AFAIK SendMessage only guarantees the message will be delivered and will return a value, not that it will happen synchronously or even for that matter on the same stack. While I don’t see this changing for performance reasons it is conceivable that this could be re-implemented in a way that is not subject to this sort of abuse. That said deeply nested SendMessage should probably be avoided anyway.

    4. I think you’re suggesting copying the stack into some other location and then resetting the stack pointer. Which means that if you have live pointers into the stack, those pointers are now wild! You would have to leave the old stack where it is and allocate a new one for the next recursion. But allocating a new stack for each kernel transition gets kind of expensive.

      1. Joshua says:

        Plan 9 had in interesting property in this regard. Pointers on the stack could not be shared with other threads because all threads used the same addresses for the stack.

      2. Cesar says:

        I would expect something more like the way Unix signals are handled: the kernel saves the relevant information in the user-mode stack, pushes a special return address into it, updates the return address to point to the signal handler, and _returns from the kernel_ (that is, all the kernel stack is deallocated), which resumes userspace at the signal handler. When the signal handler returns, the special return address points to a thunk which calls the sigreturn() system call. Now back to the kernel (with a fresh stack), it copies and validates whatever is needed from the user-mode stack, restores the user-mode state, and returns.

        With this kind of mechanism, the user-mode stack is the only one which can have unbounded growth. This allows the kernel to use some tiny stacks (on Linux, used to be a bit less than 4K, was later bumped to 8K or 16K due to the rising complexity of driver code).

        1. In Unix, signals always occur as the kernel was exiting to user mode – there was nothing kernel mode was going to do after the signal anyway. In Windows, on the other hand, the kernel can inject a call into user mode, and then act on the return value, which means resuming the continuation from the stack. For example, the kernel injects a call to the hook procedure, and then when the hook procedure returns, the kernel uses the return value to decide whether to call the next hook procedure. And then after the hook is dispatched, the kernel then continues what it was doing before it called the hook procedure. How would you do this using the unix signal model?

          1. Joshua says:

            It depends on which form of uglyness you prefer. return -EINTR where the continuation is written to a buffer provided by the userpsace caller would work. Let’s see, do I prefer kernel calls building up on the stack when the kernel has to call back into usermode or do I prefer continuation buffers that need to be able to grow and other kinds of overhead.

            MS didn’t quite finish their work in x64 Windows and exceptions can’t cross through the kernel mode callbacks anymore. This is a pill in C# Winforms where sometimes throw ex; disappears rather than reaching its catch and there’s no documentation on which callbacks have this property. I guess it finally got too ugly for them.

          2. It’s not really a choice because if you want kernel mode to call into user mode and act on the result, the EINTR model won’t work. The Unix way is basically “Kernel mode never calls into user mode. Instead, it abandons the current operation (EINTR) and returns to user mode, possibly with an altered user-mode continuation.” If you are ever tempted to call from kernel mode into user mode, you are told “We don’t do that here. You should move all your code into user mode if that’s so important to you.”

  2. Yukkuri says:

    “When I intentionally create a stack overflow” – please tell me this wasn’t some customer that thought intentionally creating a stack overflow was the right way to accomplish their goal…

    1. Brian says:

      More likely, a customer had a failing program (more complicated than the one that Raymond shows), that failed in two different ways depending on the weather or the phase of the moon. They asked “hey, what’s going on – we know why we’re failing, but we’re surprised to see two different results”. Then Raymond boiled down the customer’s sample to fit well with his “scratch program” sample code.

    2. Beldantazar says:

      Well, the goal could be to deliberately induce a problem as part of testing some other code to make sure it can handle it gracefully, but that seems like it might be giving people who would try this too much credit.

    3. JAS says:

      The mother of all stack overflows can be had with C# async functions, just move the entire call chain to the multi-GB heap. I secretly wonder if anyone ever had to solve a problem this way.

      1. Voo says:

        That’s the standard solution to transform a recursive function into an iterative one and avoid troubles with the stack: Allocate a list on the heap and every call adds a new level to it.

        1. Peter Doubleday says:

          I think, if I had that problem (and planned to have it), I’d prefer using tail recursion to converting to an iterative loop. I might even use a trampoline function, in environments where tail recursion isn’t available. Either way (hoisting the stack to heap, or simply running a recursion that doesn’t in any meaningful way recurse, ie reduce the working set) you’re probably doing something distasteful and wrong.

          It’s not so much “if only I could recurse 257 times, instead of just 256.” It’s more “Hey, recursing tens of millions of times is fun!”

          And yes, it’s all fun and games … until somebody loses an eye.

  3. Kirby FC says:

    stackoverflow.com was intentionally created September 15, 2008

    1. Yeah. And we are yet to see which resource it will expend faster: Existing users’ tolerance for harassment or market demand.

  4. Tihiy says:

    I wonder why SendMessage is not optimized to call window function directly without jumping to kernel. Windows 95 was.

    1. The point of the article is pointing that it is optimized in the case where are no hooks or other funny things.

Comments are closed.

Skip to main content