Structured Exception Handling Considered Harmful


I could have sworn that I wrote this up before, but apparently I’ve never posted it, even though it’s been one of my favorite rants for years.

In my “What’s wrong with this code, Part 6” post, several of the commenters indicated that I should be using structured exception handling to prevent the function from crashing.  I couldn’t disagree more.  In my opinion, SEH, if used for this purpose takes simple, reproducible and easy to diagnose failures and turns them into hard-to-debug subtle corruptions.

By the way, I’m far from being alone on this.  Joel Spolsky has a rather famous piece “Joel on Exceptions” where he describes his take on exception (C++ exceptions).  Raymond has also written about exception handling (on CLR exceptions).

Structured exception handling is in many ways far worse than C++ exceptions.  There are multiple ways that structured exception handling can truly mess up an application.  I’ve already mentioned the guard page exception issue.  But the problem goes further than that.  Consider what happens if you’re using SEH to ensure that your application doesn’t crash.  What happens when you have a double free?  If you don’t wrap the function in SEH, then it’s highly likely that your application will crash in the heap manager.  If, on the other hand, you’ve wrapped your functions with try/except, then the crash will be handled.  But the problem is that the exception caused the heap code to blow past the release of the heap critical section – the thread that raised the exception still holds the heap critical section. The next attempt to allocate memory on another thread will deadlock your application, and you have no way of knowing what caused it.

The example above is NOT hypothetical.  I once spent several days trying to track down a hang in Exchange that was caused by exactly this problem – Because a component in the store didn’t want to crash the store, they installed a high level exception handler.  That handler caught the exception in the heap code, and swallowed it.  And the next time we came in to do an allocation, we hung.  In this case, the offending thread had exited, so the heap critical section was marked as being owned by a thread that no longer existed.

Structured exception handling also has performance implications.  Structured exceptions are considered “asynchronous” by the compiler – any instruction might cause an exception.  As a result of this, the compiler can’t perform flow analysis in code protected by SEH.  So the compiler disables many of its optimizations in routines protected by try/catch (or try/finally).  This does not happen with C++ exceptions, by the way, since C++ exceptions are “synchronous” – the compiler knows if a method can throw (or rather, the compiler can know if an exception will not throw).

One other issue with SEH was discussed by Dave LeBlanc in Writing Secure Code, and reposted in this article on the web.  SEH can be used as a vector for security bugs – don’t assume that because you wrapped your function in SEH that your code will not suffer from security holes.  Googling for “structured exception handling security hole” leads to some interesting hits.

The bottom line is that once you’ve caught an exception, you can make NO assumptions about the state of your process.  Your exception handler really should just pop up a fatal error and terminate the process, because you have no idea what’s been corrupted during the execution of the code.

At this point, people start screaming: “But wait!  My application runs 3rd party code whose quality I don’t control.  How can I ensure 5 9’s reliability if the 3rd party code can crash?”  Well, the simple answer is to run that untrusted code out-of-proc.  That way, if the 3rd party code does crash, it doesn’t kill YOUR process.  If the 3rd party code is processing a request crashes, then the individual request fails, but at least your service didn’t go down in the process.  Remember – if you catch the exception, you can’t guarantee ANYTHING about the state of your application – it might take days for your application to crash, thus giving you a false sense of robustness, but…

 

PS: To make things clear: I’m not completely opposed to structured exception handling.  Structured exception handling has its uses, and it CAN be used effectively.  For example, all NT system calls (as opposed to Win32 APIs) capture their arguments in a try/except handler.  This is to guarantee that the version of the arguments to the system call that is referenced in the kernel is always valid – there’s no way for an application to free the memory on another thread, for example.

RPC also uses exceptions to differentiate between RPC initiated errors and function return calls – the exception is essentially used as a back-channel to provide additional error information that could not be provided by the remoted function.

Historically (I don’t know if they do this currently) the NT file-systems have also used structured exception handling extensively.  Every function in the file-systems is protected by a try/finally wrapper, and errors are propagated by throwing exception this way if any code DOES throw an exception, every routine in the call stack has an opportunity to clean up its critical sections and release allocated resources.  And IMHO, this is the ONLY way to use SEH effectively – if you want to catch exceptions, you need to ensure that every function in your call stack also uses try/finally to guarantee that cleanup occurs.

Also, to make it COMPLETELY clear.  This post is a criticism of using C/C++ structured exception handling as a way of adding robustness to applications.  It is NOT intended as a criticism of exception handling in general.  In particular, the exception handling primitives in the CLR are quite nice, and mitigate most (if not all) of the architectural criticisms that I’ve mentioned above – exceptions in the CLR are synchronous (so code wrapped in try/catch/finally can be optimized), the CLR synchronization primitives build exception unwinding into the semantics of the exception handler (so critical sections can’t dangle, and memory can’t be leaked), etc.  I do have the same issues with using exceptions as a mechanism for error propagation as Raymond and Joel do, but that’s unrelated to the affirmative harm that SEH can cause if misused.

Comments (33)

  1. Mike Dunn says:

    I’m pretty much in the same camp as Joel wrt exceptions. I just wish that __try/__finally were actually usable in C++. Not being able to use it in a scope that contains C++ objects makes it pretty much useless.

  2. Skywing says:

    Use _set_se_translator if you have to use try/catch — it lets you take SEH exceptions and throw then as C++ exceptions.

  3. Joel Spolsky says:

    The only time I ever had to use SEH was to recover from predictable crashes in IE when I was hosting the web browser control from shdocvw.

  4. Pavel Lebedinsky says:

    SEH is not evil; __try/__except(1) and catch(…) are.

  5. Pavel Lebedinsky says:

    > Use _set_se_translator if you have to use

    > try/catch — it lets you take SEH exceptions

    > and throw then as C++ exceptions.

    Don’t forget to enable /EHa (and as a result, say good-bye to a lot of compiler optimizations) if you do this.

  6. Skywing says:

    What optimizations are affected by /EHa?

  7. Skywing says:

    Nevermind; uncovered it in the VC documentation eventually.

    For the curious:

    "In previous versions of Visual C++, the C++ exception handling mechanism supported asynchronous (hardware) exceptions by default. Under the asynchronous model, the compiler assumes any instruction may generate an exception.

    With the new synchronous exception model, now the default, exceptions can be thrown only with a throw statement. Therefore, the compiler can assume that exceptions happen only at a throw statement or at a function call. This model allows the compiler to eliminate the mechanics of tracking the lifetime of certain unwindable objects, and to significantly reduce the code size, if the objects’ lifetimes do not overlap a function call or a throw statement. The two exception handling models, synchronous and asynchronous, are fully compatible and can be mixed in the same application.

    Catching hardware exceptions is still possible with the synchronous model. However, some of the unwindable objects in the function where the exception occurs may not get unwound, if the compiler judges their lifetime tracking mechanics to be unnecessary for the synchronous model."

  8. Pavel,

    The problem is that if you’re not going to do __try/__except(1) or (catch(…)), then what do you do?

    The hard part of getting SEH correct is that people don’t know what to do for the __except(1) part – that is very, very hard to get right, and is app specific (so Microsoft can’t provide a "right" answer).

    People look at SEH and their first assumption is that they can use it to add robustness to their applications. All I’m trying to say is that SEH cannot be used as a robustifier, it usually has the exact opposite effect.

  9. Pavel Lebedinsky says:

    If /EHa is used, compiler assumes that every instruction could raise a C++ exception so it needs to do a lot of bookkeeping to ensure for example that local variables with destructors are cleaned up properly.

    I don’t know how much of an impact this has on performance but presumably it was important enough to switch to /EHs by default in VC6 (or was it VC5?) and even add things like __declspec(nothrow).

  10. Pavel Lebedinsky says:

    > The problem is that if you’re not going to

    > do __try/__except(1) or (catch(…)), then

    > what do you do?

    In theory, you could use SEH to do relatively safe things like lazily committing memory by catching access violations when the buffer grows beyond its initial size (I think FormatMessage does this when you tell it to allocate the buffer for you). You just need to be careful to not catch more than you need – instead of using __except(1), write a filter that makes sure the exception code is right, the referenced address is where you expect it to be, etc. Return EXCEPTION_CONTINUE_SEARCH for everything that you don’t recognize.

    In practice however I think that you’re right – most apps should probably stay away from SEH. It doesn’t play well with C++ exception handling, and complicates debugging, especially if you use it to handle critical exceptions like AVs (windbg stops on 1st chance AVs).

  11. Ian says:

    I had to chuckle a little when I saw this posted right after I removed a __try/__except(1) that was wrapping an entire program and was met with the a chorus of other developers objecting with "but that stops it from crashing!". Now I can just send a link instead of a long winded explanation whenever I hear that! Thanks Larry!

  12. Niclas Lindgren says:

    As I mentioned in a post the other day, I still think you argue from a point where you assume that the one catching an exception would not handle it properly. Also it seems presupposed without saying that adding exception handling to your application as a way of adding fault tolerance is a always a bad thing. And here I will of course disagree with you, as a properly designed exception strategy will not in any way endanger the state of your process.

    It is all about care in design. Exceptions are indeed resource demanding and performance degrading, but if properly designed and used where it makes sense, it will spare you alot of grief. I could give you an endless list of successful implementation of this from my experience, and never did we have troubles assuring the state of the process. We might be naive, but I would like to believe that we were rather more specific in what kinds of exceptions we wanted to guard against and we tried to fully analyze the impact of those exceptions occuring.

    You mentioned the double free scenario, and I do agree that it is an unfortunate situation, and catching it will not make your life better, HOWEVER it will not make your life worse, if it did, the design of your diagnostics built into your application where not good enough, an exception’s origin can be pin pointed through proper logging. The double free will still set your application in a bad state, it will not run, however it was already not running, so your gained nothing, but lost nothing. (or then again you might be so lucky that the heap did detect it and you are "safe").

    Now, since we are building a fault tolerant application, a hanging process does not pose a problem for us, since we do indeed have supervisiion of the process itself, which will merely terminate the process if it stops reponding in the way we expect it to. So your application will still hang, but your application will still restart, just as it would without the exception handling. Some faults are not meant to be caught is the lesson learned, such faults will be weeded out, and so will all other exceptions that you do actually handle, except that the process was kept alive and kicking.

    And if you are indeed trying to debug a double free problem, then you are apparently able to somehow reproduce the problem, if you are, there are plently of good tools available to pin point that problem for you.

    You mentioned that the only effective way to use SEH is to to have try/finally in your entire call stack. However at some point you have to swallow that exception or crash, if you crash then what good do the exception handling do? (except of course if you weren’t fiddling with process global resources).

    In any case, I will not argue the fact the exception handling can be disasterous, but then again, nor do I argue that casting is lethal and buffer overruns shouldn’t occur. There are ways to avoid them all by proper design, still they occur. I will not argue that threading can be the worst thing that ever happened to you, but yet again I will say it is a very power design tool that should be used. Exception handling is a very powerful design tool, and as such it should be used, with the same precautions that you would use any other tool. I suppose you see were I am going, there are always bad apples to make the cake go sour, but that does not mean there aren’t plenty of apples that will make it sweet.

    I do however agree that exceptions is not that best way to communicate an error condition, but it might be the very easiest way to communicate a severe, critical and rare occurence, which I believe is written into the word exception. I do not agree with the people using this a mean of "normal" error propagation

    I also do not agree that simple crash analysis gets harder or simpler. If it was hard, then the diagnostic functions of the application where not up to par with the rest of the design.

    C++ exceptions are indeed synchronous by default, but they can set to, or rather the compiler can be, asynchronous if you wish. Which is why you can use C++ exception syntax to catch asynchronous exceptions too, or via _set_se_translator.

    And finally, you will not get a robust applications just by adding a try/catch sporadically in the code, it comes from a proper and through design of the full system, not by each function by itself. You can have 95% of code not bothering with exception handling, and still have a very robust application.

    In my experience the grief caused by exceptions is no where near the grief casued by race conditions and other subtile problems you might encounter. Race conditions, opposed to common view(in my experience), are equally hard in single threaded application as a multi threaded ones. Subtile race conditions are temporal dependent, and time, as we know, is a funny beast.

    And in the end, exceptions should not be used IMHO if proper function is more important that availbility. If proper function is a higher concern, then your main concern is to keep your state intact at all costs, exceptions makes this extremely difficult, at least if you want a 100% consistent state. However if minor glitches in the system are tolerated, but availability is important, then proper exception handling will save you time.

    As I tend to think, a mal functioning car is tolerated by most, but a total shutdown of the car is a big annoyance, actually that annoyance might discourage you from ever buying that brand again, but on the contrary for the car that actually still worked, might encourage you to actually by it again. This is of course given that both cars were put through the same kind of failure scenario.

    The again one could argue that is just an issue of robust application design, true, it indeed is, and robust exception handling to failure scenarios you weren’t fully thinking of is part of it IMHO.

    Also in a robust environment, if the supervision of the application detect a high error rate in any part of the application, it should alert, or even actually reboot the process(in rare cases however, as you want an operator to do it).

    A improper exception design can make your application more vulernable for DoS, but then again a proper design without exception handling can make you less or equal to one with…

    Exceptions are exceptions and not a rule, apply that to your design, but do not turn down a powerful and useful tool.

  13. Niclas Lindgren says:

    As it took my forever to compile that post in this small window, I now notice that plenty of posts has come in since I started.

    Ian:

    What did you gain by removing the catch(1)?

    Pavel:

    In what way does exception handling complicate debugging?

  14. Rob says:

    Great post Larry.

    I’d also like to add one further argument against SEH:

    I believe it is patented and so is not supported on other compilers and platforms.

  15. Pavel Lebedinsky says:

    > In what way does exception handling complicate debugging?

    Extensive use of exceptions, especially low-level ones like access violations, complicates debugging because you can no longer tell truly exceptional cases from normal program operation.

    Here’s a scenario that I’ve seen many times. You suspect that you have a crash somewhere in your program. You don’t know for sure because somebody is catching it with __except(1) or catch(…), so instead of a nice memory dump with a callstack that tells you exactly where the problem happened, you get a deadlock with some orphaned locks, or your process simply disappears, or dies with some undebuggable error.

    You try running the program under debugger so that you can catch 1st chance access violations and other "bad" exceptions, only to find that it actually raises dozens of such exceptions during its normal operation. You waste even more time trying to filter out the noise and locate the real problems.

    All this because two fundamental rules of exception handling have been violated:

    1. Don’t catch exceptions that you don’t know how to recover from.

    2. Only use exceptions for exceptional cases.

  16. Ian says:

    Niclas,

    Remember that we’re talking about SEH here, not exceptions in general, so we’re looking at pretty bad events like access violations and guard page exceptions. I can probably count on one hand the number of cases where a problem could properly recover from these events. Usualy they’re indicative of a bug in your code.

    For example, as I mentioned in my last post, I removed a a __try/__except(1) block that was wrapping an entire program. The program in question was a server, and if it caught an exception, it would log it and then happily go on serving clients. But the program couldn’t tell what had caused that exception to be thrown, and something like an access violation often points to very bad things like memory corruption. So, rather then trying to keep going, it was better to crash and let the service control manager restart the server in a clean state.

  17. At this point I’m pretty well convinced that it’s nigh unto impossible to write reliable software that uses exceptions for error propagation. To get a flavor, see http://blogs.msdn.com/mgrier/archive/2004/02/18/75324.aspx.

    Exceptions only really work reliably when nobody catches them.

    And in that case, I don’t understand why we don’t just call something like BugcheckApplication() instead of throwing an actual catchable exception.

    It was clever on VMS to have continuable exceptions which led to the SEH design on NT. I’m not sure that giving code the ability to do fun things like user-mode fixups of things like uncommitted virtual address space or adjust FP results etc. is worth the complexity that this design entails.

    Catching exceptions in an exception rich environment (like the CLR or Java for example) is nearly impossible to do correctly. If we were to start writing in C again, it’s do-able but the fact that all the new languages include capabilities like implicit conversions and operator overloading means that it’s impossible to understand whether the scope of the try/catch is correct. (And even if it was correct, changes to other parts of the code can invalidate your careful analysis and coding.)

    So, Larry’s point is entirely valid but once you accept it, it’s not hard to see that the use of exceptions in modern languages fundamentally makes it impossible to write reliable software.

    Which is, of course, funny since most people think that exceptions are about writing reliable software finally. Well, I guess throwing the exceptions is OK. It’s just those super geniuses who think that they can catch them that mess it all up. :-)

  18. Anything is better than crash simply because user still has a chance to save his work. Yeah, corruption may happen and app should warn user that exception has been caught and it might be a good idea to restart the application. But it is BETTER than crash. In debug build crash is better.

  19. Pavel Lebedinsky says:

    > Anything is better than crash simply because

    > user still has a chance to save his work.

    For a text editor, maybe. For a non-interactive service that processes financial transactions, definitely no.

    And even in a text editor a crash is better than an undebuggable deadlock after some COM object corrupts the heap then swallows the resulting AV leaving the default process heap critical section orphaned.

    If you want to allow user to save his work in case of an unhandled exception, that’s fine. Nobody is saying you shouldn’t do that. But catching unknown exceptions and not reporting them properly (using ReportFault() or something similar to that) is often worse than no exception handling at all.

  20. Niclas Lindgren says:

    Pavel:

    Well then I would agree with you if the design uses throw extensivly. I do not like a design that throws extensivly as it complicates debugging =), that is why I tried to explain that exceptions should only happen in rare conditions. So it is not the exception theory itself that complicates it for you, it is the implementation of it.

    If the caught exception leaves the application with unreleased resources, then it was not caught in all levels it needed to be caught to clean up properly. It is not the exceptions fault.

    You want a nice memory dump, even if a nice memory dump is heaven, a logged stack trace and full detail of the exception(and maybe even a hexdump of the surrounding memory) will provide you with almost equally interesting information, and you can let the memory dumps stay in house as much as possible.

    But I do agree with you, if you have a service that is not allowed to glitch, then don’t start guessing on the state of your application, you don’t want a $10 transaction turn into a $100000 one, unless of course it is your paycheck =)

    ————-

    Extensive use of exceptions is a bad thing according to me, I do not like the philosphy of Java/C#. To me it is a just a lazy way of getting out of trouble, get less nested if statements. But your code path has more than one exit point, I don’t like that because it tends to trick the developer into resource leaks. I have seen so many cases where a developer grabs some resource when the function begins, and then added some check afterwards in the code that merely did a return in the middle of it, but forgot to return the resources. I believe in simple design.

    grab resource

    work with resource, record error/succes

    release resource

    return error/success

    And in an exntensive exception philosphy this would be

    grab resource

    try

    work with resource throw on error

    catch or finally(cleanup, which is so much better)

    cleanup

    throw again if the work part throwed else return sucess.

    Which is a design that I do not like.

    I have had many of these discussions before, and the only way to convince anyone of the opposite is to show it in practice, implemented in a way which I think is proper and safe. I have never had complicate odd crashes due to it, but instead I have slept better knowing that even if we have a bug we might survive, if we didn’t then too bad. Are starting state is already a crash so it can’t get worse.

    One of the worst appliances of exceptions I can think of is COM objects used with the non raw interface wrappers, any E code is merely turned into a throw…

    Not far from this discussion is the question if you should leave asserts active in release build. I surely don’t think so, and assert in general is a lazy way to do things, it tends to lead the developer to not care of the failure scenario, and thus not to the proper clean up. Why should he/she? It will crash on the assert anyway.

    Most of the examples brought up are very rare conditions of asynchronous exceptions, most always they are much less harmful, and even more often they are merely a NULL pointer exception, which of course is a bug, but usually not fatal in any way to the program state. The non NULL pointer exceptions however are more scary.

    But it is just a question of determining which part of your program state that is most likely corrupt, rinse that and go again.

    If the same exception keeps thrashing(because your estimate of which parts of your program state that must be corrupt was wrong) then it is about time to abort. But then again that logic applies to any kind of unconditional loops, if you don’t supervise then in some way it can be a possible hang.

    catching exceptions are not a way to make your application more robust, but it is a tool among many to make it more robust

  21. Tim Smith says:

    >> Anything is better than crash simply because user still has a chance to save his work. Yeah, corruption may happen and app should warn user that exception has been caught and it might be a good idea to restart the application.

    Dear User,

    Something evil this way comes. The application you are currently running did something really bad, but we don’t really know what. I know what you are thinking, "Did I save my data five minutes ago or 30 minutes ago?" You have to ask yourself, "Am I feeling lucky?" Well do you punk? BTW, I would exit this application and restart.

    [OK]

    Problem #1: Users do not read dialog boxes. "Hey, I was typing and this dialog got in my way so I am going to OK it to get it to go away. I worked for three more hours and then when I went to save, it trashed my data."

    Problem #2: Users do not understand that concept of a program that has "crashed" but is still running. "Hey, I can still type, things must be good".

    Problem #3: Users do not understand that what they save might be totally trashed an can not be read back in. They will save over their current version of their file. If you force them to save to another file name they will curse you for forcing them to do something they don’t want to do and then promptly delete their original and rename the newly saved trashed file. If they don’t know how to rename files, they will just load up the old file and that trashed file will remain in their directory haunting them until they get a new computer. (Nah, I’ve never seen this happen. Right…)

    Problem #4: In a mission critical application you run a great risk of sending bad data to other applications. I have seen this nearly happen. People can die from bad data. Usually caused by a series of procedural errors and a computer error. "Stupid didn’t turn the panel into service mode and remove power before starting to work on the screw pump. A hardware fault then caused the software to fail and the screw pump was turned on."

    If an application has an "unexpected exception" , all bets are off as far as any level of functionality.

  22. Niclas Lindgren says:

    "If an application has an "unexpected exception" , all bets are off as far as any level of functionality."

    True, but that does not mean that it will work better because you restart it. The user or the application will probably retry what he/she/it just did and probably hit the same bug again, rinse and repeat. If this is an online service I am sure you can already hear the phones ringing from the slightly upset customer.

    "Stupid didn’t turn the panel into service mode and remove power before starting to work on the screw pump. A hardware fault then caused the software to fail and the screw pump was turned on."

    Any kind of memory corruption bug could cause this to happen, or any offer kind of bug too for that matter.

    Crashes are just too expensive when it comes to customers perception of the stability of the application. In many cases it is better to stay alive and hope for the best, because most of the time it will be fine.

  23. Pavel Lebedinsky says:

    > If an application has an "unexpected

    > exception", all bets are off as far as any

    > level of functionality.

    That’s taking it to the other extreme.

    Certainly there are applications where saving user’s work in case of a crash makes sense. Like email processors for example.

    This is a separate issue from where to handle unknown exceptions and how to report them.

  24. Trying to run code in an address space which is likely to be corrupt is just plain bad for the user. If you really want to preserve the value of the keystrokes/operations that occurred before the crash, then journal them!

    All the editors on VAX/VMS journalled; I was shocked to come to the PC world and that we never do such things.

    This is a much smarter approach than to try to continue to run code in the corrupt address space.

  25. Niclas Lindgren says:

    Well, journalling might sound smart, but remember that journalling will recreate what you just did, which means if you hit the bug doing it, journalling to back track to it will most likely hit it again and voila…(at least if it is one of those bugs that I like to catch with this kind of exception handling, a state problem).

    There are very few occasions that you actually do corrupt the address space, most exception occur due to subtile race conditions where a pointer dangled, these may or may not corrupt your user space. They may or may not corrupt the user space without you noticing it for a long time(that is no exception), an exception is not a receipt telling you that your address space is corrupt, it is more often than not a programming error, where the program happen to get into a state which wasn’t fully analyzed. Rarely does that mean that the user space is corrupt.

    You can actually have user space corruption that you _never_ notice, and that actually didn’t matter, and if it didn’t matter why bother?

    As I said, if you have a program where you need to 100% know that the entire state of the application is healthy, then exceptions should not be part of such a model, but if you don’t need that, exceptions will catch the numerous NULL pointer exceptions that hurt noone (unless the application crashes of course).

    I have one very fine real life occurence, that happen only 2 days ago. Our system had a NULL pointer bug in the provisioning part of the system, when a certain order of events occured. Our application sadly uses C on FreeBSD, so exceptions are not portable. Anyway if we had nicely caught that exception with nice logging so we could fix it in the normal process, the customer could keep using the system with a _slight_ defect.

    As it was now, since it crashed the entire system (it is a Mobile IP Telephony exchange), with approx 80000-100000 active mean users 24/7, repeatedly, since this one user kept running with the same setup, it turned into a class A red alert where we had to bring in the right people and go through en emergency build procedure. The costs of doing that are huge, and I would take the bad sides of exceptions any day of the week to avoid those.

    I should probably add, that we wouldn’t lose all 100k users right away, but the user causing the crash was moved around within the system each time it connected, so eventually it had crashed literally all users and kept doing so. So even with process seperation you are far from safe, nor will the problem get any better because you reboot.

    The system was funtioning flawlessly to 99.9%, the 0.1% that casued the crash made 100% unusable, that is just not exceptable, anything is better than that.

    I frankly don’t care if I could get into worse problems using exceptions, because they don’t get worse than that. What should have happened was a nice little SNMP trap due to the exception to the operator, a nice little email with a traplog from the operator, a call to a designer for analysis to advice the customer. This is by far cheaper than having to bring in half the staff for spinning a new build, for a missing "if (pPointer)", imagine what those about 15 bytes of code costed.

    I know what you will say, the (possible) memory corruption could cause the system to for instance charge more from users than it should, but I will wager alot that the exception handling will not be the reason for that, it will be the X number of other bugs(Yes, all applications have bugs, no matter how much testing is done, some just have less)

    Even if the exception had been a dangling pointer, chances are that it was a race, and that it was removed prematurely, that will not casued any harm to the user space either. They are all programatic errors, that are _far_ more common than buffer overrun(which are the most common memory corruptions) or double deletes.

    The class of bugs that are dangerous to catch with exceptions(which cause some kind of corruption) are _far_ smaller than the class of stupidity bugs that can be found in every application. And even if you do catch a bug which is a result of corruption, then most likely are you able to clean up the state problem to restart that little part of the system.

    If we lived in an ideal world I would agree to never catch exceptions, but anywhere I turn my head the world is not ideal and all we do is try to plan and foresee all kinds of coming trouble.

    If trouble was never coming then we wouldn’t even have invented a word exception.

    Those are my pennies and I will continue to be paranoid about crashing an application.

  26. Norman Diamond says:

    9/12/2004 2:39 PM Niclas Lindgren

    > journalling will recreate what you just did,

    > which means if you hit the bug doing it,

    > journalling to back track to it will most

    > likely hit it again and voila

    Bingo. I once edited a journal, deleting the mention of the keystroke that was mishandled by the editor, so that replay of the modified journal would not hit the bug. Then I saved my work at that point, i.e. saving with a loss of one known keystroke instead of saving with a loss of a forgotten number of minutes and changes. Then I found some other way to proceed with the next necessary change.

    Theoretically the journal would also be of immense value to any coder who wanted to fix the editor.

    > we had to bring in the right people and go

    > through en emergency build procedure

    Well, I’ll repeat here an idea which got me a black mark on my record at one previous employer, and which could have got me fired if my boss had been present. Maybe someone can say what was wrong with it. When the current build wasn’t working and you had an emergency situation where you needed a working build, boot the previous working build. While the production system is operating on its previous build, configure one test system to match, boot the failing build there, and debug it on the test system. Orange flag emergency instead of red flag. During the time that the previous build is running, you don’t get to charge customers for features that aren’t being provided, but you still get to charge for basic telephone service and customers still have it.

    (Well actually yes I do know what was wrong with my suggestion. I’m an engineer and corporate politicians are corporate politicians, and there’s no room for engineers in companies that are run by politicians.)

  27. Michael says:

    For user apps, auto-saving work, automatically restarting the app, and giving the user the option to recover their old documents is usually fine.

    For online services doing financial transactions, you better not be messing with my money in corrupt address space.

  28. rsd says:

    >As a result of this, the compiler can’t >perform flow analysis in code protected by >SEH. So the compiler disables many of its >optimizations in routines protected by >try/catch (or try/finally). This does not >happen with C++ exceptions

    shouldnt that read "try/except (or try/finally)" ?

  29. probably try/except/finally, you’re right rsd.

  30. I just ran into this post by Eric Brechner who is the director of Microsoft’s Engineering Excellence