How to turn off the exception handler that COM "helpfully" wraps around your server


Historically, COM placed a giant try/except around your server's methods. If your server encountered what would normally be an unhandled exception, the giant try/except would catch it and turn it into the error RPC_E_SERVERFAULT. It then marked the exception as handled, so that the server remained running, thereby "improving robustness by keeping the server running even when it encountered a problem."

Mind you, this was actually a disservice.

The fact that an unhandled exception occurred means that the server was in an unexpected state. By catching the exception and saying, "Don't worry, it's all good," you end up leaving a corrupted server running. For example:

HRESULT CServer::DoOneWork(...)
{
 CWork *pwork = m_listWorkPending.RemoveFirst();
 if (pwork) {
   pwork->UpdateTimeStamp();
   pwork->FrobTheWidget();
   pwork->ReversePolarity();
   pwork->UnfrobTheWidget();
   m_listWorkDone.Add(pwork);
 }
 return S_OK;
}

Suppose there's a bug somewhere that causes pwork->Reverse­Polarity() to crash. Maybe the problem is that the neutrons aren't flowing, so there's no polarity to reverse. Maybe the polarizer is not property initialized. Whatever, doesn't matter what the problem is, just assume there's a bug that prevents it from working.

With the global try/except, COM catches the exception and returns RPC_E_SERVERFAULT back to the caller. Your server remains up and running, ready for another request. Mind you, your server is also corrupted. The widget never got unfrobbed, the timestamp refers to work that never completed, and the CWork that you removed from the pending work list got leaked.

But, hey, your server stayed up.

A few hours later, the server starts returning E_OUTOFMEMORY errors (because of all the leaked work items), you get errors because there are too many outstanding frobs, and the client hangs because it's waiting for a completion notification on that work item that you lost track of. You debug the server to see why everything is so screwed up, but you can't find anything wrong. "I don't understand why we are leaking frobs. Every time we frob a widget, there's a call to unfrob right after it!"

You eventually throw up your hands in resignation. "I can't figure it out. There's no way we can be leaking frobs."

Even worse, the inconsistent object state can be a security hole. An attacker tricks you into reversing the polarity of a nonexistent neutron flow, which causes you to leave the widget frobbed by mistake. Bingo, frobbing a widget makes it temporarily exempt from unauthorized polarity changes, and now the bad guys can change the polarity at will. Now you have to chase a security vulnerability where widgets are being left frobbed, and you still can't find it.

Catching all exceptions and letting the process continue running assumes that a server can recover from an unexpected failure. But this is absurd. You already know that the server is unrecoverably toast: It crashed!

Much better is to let the server crash so that the crash dump can be captured at the point of the failure. Now you have a fighting chance of figuring out what's going on.

But how do you turn off that massive try/except? You didn't put it in your code; COM created it for you.

Enter IGlobal­Options: Set the COMGLB_EXCEPTION_HANDLING property to COMGLB_EXCEPTION_DONOT_HANDLE, which means "Please don't try to 'help' me by catching all exceptions. If a fatal exception occurs in my code, then go ahead and let the process crash." In Windows 7, you can ask for the even stronger COMGLB_EXCEPTION_DONOT_HANDLE_ANY, which means "Don't even try to catch 'nonfatal' exceptions."

Wait, what's a 'fatal' exception?

A 'fatal' exception, at least as COM interprets it, is an exception like STATUS_ACCESS_VIOLATION or STATUS_ILLEGAL_INSTRUCTION. (A complete list is in this sample Rpc exception filter.) On the other hand a 'nonfatal' exception is something like a C++ exception or a CLR exception. You probably want an unhandled C++ or CLR exception to crash your server, too; after all, it would have crashed your program if it weren't running as a server. Therefore, my personal recommendation is to use COMGLB_EXCEPTION_DONOT_HANDLE_ANY whenever possible.

"That's great, but why is the default behavior the dangerous 'silently swallow exceptions' mode?"

The COM folks have made numerous attempts to change the default from the dangerous mode to one of the safer modes, but the application compatibility consequences have always been too great. Turns out there are a lot of servers that actually rely on COM silently masking their exceptions.

But at least now you won't be one of them.

Comments (37)
  1. Anthony Wieser says:

    A similar bit of exception swallowing is described here related to 32-64 bit transitions:

    I've run into this quite a lot lately, where my error reporting code doesn't get triggered.

    blog.paulbetts.org/…/the-case-of-the-disappearing-onload-exception-user-mode-callback-exceptions-in-x64

  2. SimonRev says:

    "pwork->Reverse­Polarity() to crash. Maybe the problem is that the neutrons aren't flowing, so there's no polarity to reverse"

    Been watching classic Jon Pertwee Doctor Who lately?

  3. Tom says:

    I like the homage to Pertwee's Doctor Who: "I just reversed the polarity!"

    And for all the nit-pickers, I know it's hard to imagine how you can reverse the polarity of a neutron (it being neutral, and all) but just let it go and enjoy the rest of the story.  

    Hey!  I sounded like Paul Harvey just then… :)

  4. RoddyP says:

    I think some of this argument is a bit fallacious: If your 'widget' has a persistence beyond the process (maybe it's a file widget or database widget) then if reversePolarity() bombs it's going to stay frobbed regardless of an external try/catch loop. Surely the real solution is to make your widget get frobbed in a RAII style, where a WidgetFrobber object would have a destructor to unfrob it automagically. That way you're safe regardless of what happens inside ReversePolarity()

  5. Adrian says:

    @RoddyP

    RAII is great for making your stuff exception safe when we're talking about C++ exceptions, but this COM handler is also catching system exceptions as well, like an access violation.  If you get a fatal exception like that, you should crash without trying to unwind the stack frames and run destructors.

  6. bg says:

    Is there a performance hit on the try/catch mechanism or to put it another way: will my code that uses lots of com components go faster if I turn the try/catch mechanism off?

    [Even if there were a difference (and I doubt there is; it's probably just a change to the filter), the difference is negligible (and zero on non-x86). Read up on how SEH is implemented for more information. -Raymond]
  7. RoddyP says:

    @Zarat: Exceptions in destructor aren't ignored: They should force terminate() to be called, so your process quits immediately. And no, I'm not advocating keeping the external try/except loop  – just pointing out that removing it doesn't fix the basic exception safety issues in DoOneWork().

    @Adrian: Ah, so system exceptions aren't always C++ exceptions? I guess that's toolchain specific, as I'm sure I can try/catch (EAccessViolation &e) with C++Builder. I don't think there's anything inherently 'non-continuable' about access violations…?

    [It always amazes me how people focus on the details of the example and not the big picture. Who cares about DoOneWork. Substitute whatever you want that meets your personal coding standard. (C++ exceptions are synchronous. Win32 exceptions are asynchronous. Some compilers convert asynchronous exceptions into synchronous exceptions. I would argue that this is attempting to solve a problem by creating a bigger problem.) -Raymond]
  8. Brian says:

    (Hmm, hope this doesn't double-post)

    I realize they are just generic terms, designed not to convey any particular meaning or function, but every time I read a sentence like 'The widget never got unfrobbed' I can't help but laugh.  One of your finest articles, Raymond.

  9. Crescens2k says:

    RoddyP:

    Things like AV etc are SEH exceptions and it isn't toolchain specific, this is how Windows itself works.

    What C++ Builder probably does is like what VC is capable of doing, and that is setting it to catch SEH exceptions with try/catch and wrapping the exception in an object. But a thing to note is that the VC documentation discourages you from using /EHa to catch SEH exceptions with try/catch.

    bg:

    The performance hit isn't with using code which is set up for exceptions, but when the exception occurs. So there should be no, or very little difference in speed between code which is set up to handle exceptions and code which isn't.

  10. o says:

    I find the frobbing widget polarity stuff to be more confusing and annoying than a simple real-world example.

    [I didn't have any simple real-world examples. (And I'm not going to ask a customer "Hey, can I post your source code on my blog?") -Raymond]
  11. MikeCaron says:

    I am reminded of a post your colleague Eric Lippert posted regarding types of exceptions (blogs.msdn.com/…/vexing-exceptions.aspx), and why you shouldn't catch fatal exceptions (blogs.msdn.com/…/asynchrony-in-c-5-part-eight-more-exceptions.aspx). Warning: CLR/.NET stuff beyond these links, but I imagine it's just as relevant to regular exceptions as well.

  12. Michael Kohne says:

    OK, so we know why the COM team haven't fixed this (and I can't imagine they'll EVER be able to fix it – compatibility needs to take first priority here), but what I don't understand is how we got here to begin with! I have my issues with MS, but I don't usually see things that are quite this…problematic. How did this ever get out the door with this 'auto-eat exceptions' thing turned on?

    [That's like asking why people wore stupid clothes in the 1970's. Fashions change. Everybody said "it's about time, you idiots" when Windows 3.1 added parameter validation, which means that back then the programming fashion was KeepRunning rather than FailFast. -Raymond]
  13. Mike Dimmick says:

    @Crescens2k: there is always a performance hit on x86 Windows in fiddling with the exception frame chain when a new exception frame is created, by pushing various structures describing the filter onto

    the stack, storing the link to the previous handler from FS:[0], and setting FS:[0] to point to the new exception frame. This work has to be done for __finally handling, including any cases where you have stack locals that have non-trivial destructors. If it can prove that a function, and the tree of called functions, does not throw a C++ exception, the compiler will eliminate any auto-generated handlers. It can do this by assuming that only a 'throw' statement generates an exception that it has to handle, which MS calls 'synchronous'. Unfortunately that means if an SEH exception is raised by hardware ('asynchronous'), your stack locals won't be destructed.

    /EHa tells the Microsoft compiler not to discard the auto-generated handlers, so the stack should unwind properly even in the face of SEH exceptions. The cost is much larger code and constant exception frame management. Often the compiler cannot optimize a function with exception frames to the same extent that it could without.

    The compiler does use a single exception frame per function, no matter how many nested __try, try, or nested scopes containing destructable objects you have (assuming at least one, of course!) It generates tables to map the instruction pointer to the appropriate scope and appropriate handler – all exception frames then point to the same handler to process the tables. The CLR only pushes a new frame on transition to unmanaged code – the documentation for /EH says that with /clr, you get /EHa, which is expected because the CLR is already converting SEH exceptions to CLR exceptions (e.g. AccessViolationException).

    For all other processors that Windows has ever run on, including Windows CE on non-x86 processors, SEH exceptions are table-based. It's generally slower to locate handlers, because frequently the tables have either never been referenced or have been removed from the working set since last referenced, but zero code is generated to manage exceptions for the no-exceptions case.

  14. tobi says:

    Maybe any comments on why ignoring exception in an asp.net context is not harmful? (They just turn into 500 errors). My guess is that the framework assumes that requests are independent and any state that was corrupted gets thrown away when the request exits.

  15. barrkel says:

    Delphi generally recommends using the 'safecall' calling convention when implementing COM-style interfaces. That calling convention is 'stdcall' with a couple of extra features: a return type of HRESULT, and an automatic exception handler for the method that routes the exception to a base class's method (TObject.SafeCallException). In certain base classes (like TComObject), that method is overridden and changes the return value to E_UNEXECTED, and does the whole ISupportErrorInfo, SetErrorInfo etc. with data from the native exception. SEH exceptions, meanwhile, are already converted at a low level on the stack to native exceptions, so they get similar treatment. If you want different behaviour (fail fast or whatever), you can override it and provide your own.

    Of course, Delphi has a different philosophy with regards to exceptions. In Delphi, when an exception is raised, it's not generally assumed that the application is toast, that all is lost, and the best thing to do is to terminate. Delphi, by default, has a message pump loop that catches all exceptions and displays their associated message in an error box. Raising an exception is a more or less accepted way of aborting a procedure with an associated error message. The error message associated with the exception is generally meant to be end-user readable where relevant (e.g. not an access violation exception).

    C++ Builder, sharing as it does much of the same RTL as Delphi, inherits many of these features, but I couldn't tell you offhand exactly which bits.

  16. Crescens2k says:

    Mike Dimmick:

    I know there is a hit on x86, I know how exception handling works, and if you read what I wrote then you would have noticed it said "no, or very little", please note the very little. Even if you say optimisation get in the way, does that really make a staggering difference? I don't think so, and if it did then it is easy enough to just put the code in a seperate function and call that so the function doesn't have an exception handling frame. (It is a handful of assembly instructions per function).

    So even with your lecture on exception handling, the reply stays the same. There should be no difference, or at least very little difference in code set up for exception handling compared to functions not set up for exception handling.

    Of course, this isn't including the actual exception handling. That should be a rare occurance.

    Also remember, this was in the context of SEH exception handling around the COM functions and if turning them off would make a difference. So going into the whole C++ + /EHa + CLR was pointless.

  17. Dan Bugglin says:

    If you can't use COMGLB_EXCEPTION_DONOT_HANDLE_ANY, couldn't you wrap your program in it's own try/except and terminate the process on an exception?

    Not the ideal solution (AFAIK using any sort of try block is slower than not) but it would be a working, if hacky, workaround.  I think.

    [Wrapping doesn't help because exception handling is done innermost first, and the COM handler is inside your wrapper. You would have to put your wrapper *inside* the COM handler (i.e., on every method). -Raymond]
  18. Joshua says:

    Bit by the bug known as backwards compatibility. BeOS was the only one to manage to avoid that to my knowledge, but they didn't live long enough for us to really know if the experiment was successful.

  19. Miral says:

    @Crescens2k: Actually, /EHa vs. /EHs is very relevant.  Under the default /EHsc, destructors are NOT guaranteed to be called on an asynchronous exception such as an access violation.  So even if your code is written in perfect RAII form, you're not going to clean up properly from access violations unless you're using /EHa.

  20. Gabe says:

    Michael Kohne: It's easy to argue that ignoring exceptions should be the default. If you browse the web with a script debugger enabled, you will see that a vast number JavaScript exceptions showing up all over. Generally speaking, these errors are harmless. Usually they either cause something that you'd never notice not to work or they cause something you don't care about not to work. Only rarely will such an error cause a form not to submit or some other unfortunate failure.

    So given the ubiquity of JS errors throughout the web, can you imagine what would happen if the browser closed a web page every time it threw an exception? The web would be unusable! What if the browser itself crashed every time a JS exception went unhandled? The use can't actually fix the problem, so why not just keep running and let the user decide if they can keep going?

    Of course there are many differences between COM servers and web pages, but it's easy to see how KeepRunning was the fashion.

  21. Zarat says:

    @RoddyP

    You'd just moved the problem. What if the exception happens during unfrobbing in the destructor? The exception is still ignored and the server continues to run. Besides that there are kinds of exceptions which make it impossible to safely continue running *any* code in the process, and it'd still be ignored.

  22. jon23423 says:

    How does one turn off the try/except that also "helpfully" wrap (non-COM) RPC servers?

    I believe they are there to catch the exception and instead propagate it to the RPC client, however this can leave the server in an unexpected state.

  23. James says:

    @Gabe: I think the web is a somewhat different case in that the code is being executed by some unknown interpreter over which the page author has no control.  It's harder to expect web developers to program for strict correctness when "correctness" is a moving target (even if you write strictly to some standard, the interpreter might not be strictly conforming).

    That said, if errors were more visible, I'm sure there's a large class of errors that would be fixed.

  24. Yuhong Bao says:

    James: And a JavaScript error is much less severe than an SEH.

  25. Eception swallowing wouldn't be so bad if, once an object broke by throwing an exception, it *stayed* broken until destroyed. You couldn't access the state at all in that case.

    Of course, it's too late to change now.

    [You're assuming that the damage is localized to the object which raised the unhandled exception. What if the exception in CWork::ReversePolarity was raised by a singleton CSonicScrewdriver::Enable object? Do you mark the CWork as broken, the singleton CSonicScrewdriver, the CServer? All of them? If the CServer is broken, then what's the point of running? -Raymond]
  26. Cheong says:

    How about COM extensions that used by web servers?

    Take System.Web.Mail in .NET v1.1 for example, it is really built upon CDO COM objects. Would you prefer all further emails be blocked from sending just because some random email cased CDO to fail? Remind you, most web servers operates unattended and mostly offsite (especially for web hosting solutions)

    IMO, the automatic exception handler would make sense in this case. Although it would have been better if the wrapper can be configured to dump the "core" to some folder for later investigations. (Any chance to get such function in IGlobal­Options?)

  27. Morten says:

    Lovely. Great explanation of why the design decisions we make today are going to be a royal PITA in 5 or 10 years. What's right today is not right tomorrow. Just got bitten by one of them things recently and it still smarts… There is no silver bullet.

  28. Mason Wheeler says:

    @Goran:

    Your last sentence is a perfect demonstration of what's wrong with the "RAII as a substitute for try/finally" idiom.  You end up thinking of everything as "a resource," when a lot of things aren't.  (When all you have is a hammer…)

    To imitate a try/finally block, you need to create a whole new class for each different type of operation you want to perform, which then of course introduces the potential for new bugs and needs to be tested.

    For example, what's the simplest possible C++ implementation of this standard Delphi idiom?

    MyDataset.DisableControls;

    try

     LoadDataset(MyDataset, DataList);

    finally

     MyDataset.EnableControls;

    end;

    Can it be done without having to create a new class?  Can it be done in anywhere near six lines?  And can it be done in such a way that ensures that EnableControls fires immediately after LoadDataset, and not at the end of the function?

    RAII tends to be implemented with an implicit try/finally frame inserted by the compiler, and IMO the example above demonstrates a textbook case of why abstraction inversion is considered a bad thing.

  29. Mason Wheeler says:

    I have to take exception (no pun intended) with the way this statement is presented as some sort of universal fact:

    "The fact that an unhandled exception occurred means that the server was in an unexpected state. By catching the exception and saying, "Don't worry, it's all good," you end up leaving a corrupted server running."

    Maybe if you're writing in C++, which has abysmal support for exception handling and no try/finally construct to take care of unexpected exceptions properly, (and no, RAII is *not* a proper substitute for a true try/finally,) this might be an issue. But as Barry Kelly pointed out above, things look very different in Delphi.

    It's idiomatic to wrap temporary state changes that need to be reversed in try/finally blocks, and then even if an unexpected exception occurs, your code remains in a consistent state while the stack is unwound, turning this "severe corruption problem" into a non-issue.

    [s/exception/unhandled exception/. Obviously, if your code is expecting the exception, then it's part of your normal code flow. I can't believe I had to write that. -Raymond]
  30. Briar says:

    @Mason Wheeler: that's such a good example. So… LoadDataset essentially crashed. And you now want to re-enable the control that was being loaded ? What's the likelihood it contains what you want ?

  31. Alex Grigoriev says:

    @Mason Wheeler:

    Remember, RC isn't just talikng about C++ exceptions. In fact, he's primarily talking about non-C++ exceptions, such as access violation. When those occur, yes, your code is toast. The process memory may be in an unknown corrupted state, in general case.

  32. Gabe says:

    James: The web is different in how it's used (I'm not suggesting that COM servers and web browsers should handle exceptions the same way), but that's not my point. My point is that the web is a demonstration of how 99% of unhandled exceptions can be safely ignored.

    Most null pointer dereferences are just "Oops, I forgot to check for NULL", not "ZOMG!! HOW DID THAT BECOME NULL!!??" so it usually doesn't matter if it's ignored. Of course ignoring them all to avoid annoying the user 99% of the time makes it impossible to debug the 1% of cases where it really is a problem.

  33. Goran says:

    Wow, excellent, I didn't know about IGlobalOptions! 100% agree that catching faulted server errors is a disservice. That said…

    @Barry Kelly: Delphi is IMO wonderful, but even in Delphi, if you get e.g. access violation, in general case, code is toast. AFAIK, people who competently write Delphi code are rather clean WRT exception safety (they are IMO better than C++ people in that respect), but still I see little point in trying to continue upon an AV, even in Delphi. IOW, Borland made the same mistake as MS with /EHa. Perhaps they copied that? BTW, yes, VCL UI code has a message pump that displays exception, but so does e.g. MFC (that's overridable, too), and frankly, any UI framework worth it's salt should have that.

    @Mason Wheeler ("RAII is *not* a proper substitute for a true try/finally"):

    Take a look at ScopeGuard. That makes C++ without finally quite bearable. But you should note that if you need a lot of ScopeGuards in your code, you are doing it wrong (you aren't wrapping your resources in C++ types properly).

  34. Mason Wheeler says:

    Briar: No, it didn't "essentially crash".  Something somewhere threw an exception, and we don't know what or which or where. It may or may not be recoverable, but we don't care at this point; that's the whole point of a try/finally block.  We just want to make sure that if it *is* recoverable, (and most exceptions are, because the Delphi standard libraries are designed to continue working correctly if possible even if one thing fails,) the state of the program remains consistent.

    For all we know, something could have raised EAbort, a silent exception that basically signals a rollback of the current GUI operation.  In that case, re-enabling the controls is *exactly* what you're supposed to do.

  35. Amit says:

    FYI – IGlobalOptions is not available with Windows SDK 6.0 and 6.0A, it is available from SDK 7.0 onwards defined in ObjIdl.h

    To use Windows SDK 7.0 with VS 2008, look at blogs.msdn.com/…/using-the-win-7-sdk-build-environment-with-vs-2008.aspx

  36. random says:

    Vectored Exceptions can give you a hook when any exception is raised.  This hook comes with great danger.  msdn.microsoft.com/…/ms679274(v=VS.85).aspx has a good entry point.

  37. Ben says:

    I've had to argue this one out a few times. There is a perception that the worst thing a program can do is "crash"; and from a marketing perspective, perhaps there is some truth to this. Certainly the exception dialog looks very bad from a user perception point of view. However, the worst thing a program can do is actually to "continue to run in a corrupted state". Now if your program is a media player, running in a corrupt state is probably no big deal, but if your a database or controlling some industrial hardware, real damage could occur. If you crash, you just stop, if you run in a corrupt state, you corrupt databases and pour molten steel in places it wasn't supposed to go (like peoples heads). It's not always obvious that your creating this situation "we just handle any exception from this task by logging an error". The sad fact is that you will probably end up with more crashes by doing this right (That method can throw the "directory not found" exception as well as the "file not found"?), but you'll avoid much worse scenarios that prove to be extremely difficult to resolve.

Comments are closed.