catch considered harmful


Spot the bug:


void CFoo::Bar() {


    m_array1[m_i++] = null;


    m_array2[m_j++] = null;


}


I’ll give you a hint – it relates to my last posting about “i = i +1;” being a bug.


One answer was to let the increment overflow.  Well in that case the bug seems pretty obvious.  Another answer is that you just have to be really careful.  Nobody wants to be careful any more when programming.  The remaining reasonable option was that integer overflows raise exceptions.


Ok, let’s explore that option.  Let’s say that the second of the two increments throws.  As long as the exception isn’t caught, everything’s OK – the failure was critical, the process terminated without any code attempting to rely on the (invalid) invariants in the CFoo object…


But why throw an exception if you don’t expect it to be caught?  Shouldn’t you call exit()/TerminateProcess()/whatever to just terminate the process?


So if you’re interested in throwing the exception you maybe should have written:


void CFoo::Bar() {


    int i = m_i++;


    try {


        Object o1 = m_array1[i];


        try {


            m_array1[i] = null;


            int j = m_j++;


            try {


                m_array2[i] = null;


            } catch {


                m_j = j;


                throw;


            }


        } catch {


            m_array1[i] = o1;


            throw;


        }


    } catch {


        m_i = i;


        throw;


    }


}


Well that’s still not quite right either; both GetValue and SetValue on System.Array may throw exceptions so even the recovery code may fail in which case the global invariants were not restored.  C++ has a hope of getting this right; let’s assume that the types are a value type with overloaded operators to throw on overflow…


void CFoo::Bar() {


    CResetter<CSizeT> r1(m_i);


    CResetter<CSizeT> r2(m_j);


    CObject *pArray1 = m_array1; // could throw


    CObject *pArray2 = m_array2; // could throw


    CResetter<CObject> r3(pArray1[m_i]);


    CResetter<CObject> r4(pArray2[m_j]);


    pArray1[m_i++] = null;


    pArray2[m_j++] = null;


    r1.Disarm(); r2.Disarm();


    r3.Disarm(); r4.Disarm();


}


Who writes code like that?  Who wants to write code like that?  There’s actually no way to do this “right” in the current CLR class library design since there’s no operator on System.Array that returns a reference which is the key to being able to do this in C++.


If someone catches the exception raised in the first case, whose bug is it?  The implementor of CFoo might say “well, I don’t expect anyone to catch exceptions my code throws” but the fact is that there are a lot of try / catch blocks sitting in interesting places like thread pools etc.


People mistakenly believe that there are kinds of exceptions which are uncatchable.  The truth is that there is a contract from the thrower through all the intervening frames to the catcher.  Anyone writing reusable code needs to understand this contract but do we really expect all the code in the world to be this rigorous and convoluted?


Note that the thread pool’s catch (Exception e) causes intervening unwinders to run, even if the exception is to be later propagated.  Running any code while invariants are false is dangerous at best.


Most of the exception literature came from people working in the areas of functional languages.  In a functional language this issue is moot since there weren’t persistent side effects.  (I also worked on a programming environment once where each frame was a nested transaction and unwinding through a frame typically just rolled back the appropriate nested transaction.  But nobody wants to pay that kind of per-frame costs.)


I come back to postulate 1: programming is hard.  Programming shared components is harder.


 


Comments (20)

  1. dal says:

    Correct exception handling is one of the least documented aspects of .net programming. The guidelines that exist are completely inadequate – they usually address trivial issues such as how to format a message and ignore the important (read: hard) issues such as how/when to throw or catch based on component type and system design. I’ve been involved in establishing internal standards about exception handling and while it’s easy to agree on the trivial junk it’s almost impossible to agree on the hard issues.

    One of the hard issues has to do with the side-effects of exceptions. There are two basic issues: how does it affect outbound arguments and how does it affect object state. When you design a compoment/method you need to think about these issues and establish a policy; will a given method ensure there are no side-effects, or will it allow side-effects to remain visible if an exception is thrown from the method. In the example no policy was stated so it is not possible to determine if the code is correct or not. Coding so that there are no side-effects is not impossible but it does require thought and careful implementation.

    In .NET it is almost impossible to examine source code and be able to predict where an exception cannot be thrown. The only safe approach is to assume that an exception can occur anywhere at all. For example, another thread with sufficient security rights can inject a thread abort exception into any other thread.

    It’s usually easy to write code that implements a particular feature; it’s hard to write code that does that and is also reliable and robust.

    You stated that the thread pool’s catch will cause intervening unwinders to run…that is true, but only on the thread where the exception occurred. Other threads will not be affected until the exception is later propagated. Anyone writing multithreaded programs needs to understand the synchronization requirements of the design.

    As a side note, unless you want to introduce far more serious bugs into your code never ever call TerminateProcess, Environment.Exit, Process,Kill, or any other variants, especially from within a catch handler buried deep within your code.

  2. Re: terminating a process:

    The thing about terminating the process is that (a) if the process is system critical, you will bugcheck the system and (b) it may be an unpleasent experience for the user.

    On the other hand, running code while global invariants are not true has the potential of revealing private information, corrupting data or getting privileged code to execute based on false assumptions.

    I’ll take "fail safe" any day.

  3. Pavel Lebedinsky says:

    > Correct exception handling is one of the least documented aspects of .net programming. The guidelines that exist are completely inadequate…

    If you have any suggestions on improving design guidelines make sure you let Brad Abrams (http://weblogs.asp.net/brada) know.

    > unless you want to introduce far more serious bugs into your code never ever call TerminateProcess, Environment.Exit, Process,Kill, or any other variants, especially from within a catch handler buried deep within your code.

    Can you give an example of what you mean here? I can’t think of anything especially bad that could happen after I call Environment.Exit(unless the process was totally messed up to begin with).

  4. dal says:

    Calling Process.Kill is the worst way to shutdown because that does a hardstop on the entire process; no finalizers run, threads are killed, etc. This is similar to calling Win32’s TerminateProcess; do not try this at home.

    The problem with calling Environment.Exit is that while it is a mostly-graceful shutdown (finalizers run) it still is not a completely graceful shutdown. It operates completely differently then unloading an appdomain. Refer to Chris Crumme’s most excellent blog for a detailed description; here’s a summary (there’s a lot left out):

    1. The Finalizer thread finalizes all unreachable objects. This is normal and occurs in all appdomains.

    2. ProcessExit event is raised.

    3. Suspend all managed threads – these threads are never restarted.

    4. Finalize all objects, including rooted objects.

    5. Exit.

    Two big problems; first, all appdomains are unloaded, and second, code in finally blocks are not executed.

    In step 3 all threads are suspended and never resumed. This means that code in finally blocks never execute! While finalizers for rooted objects still run (step 4), which is useful for reclaiming unmanaged resources, finally blocks are not run, which means that any transaction-type cleanup code never run.

    A minor problem for unwary code; in step 4 finalizers are run for all objects, even those that are still rooted, and there is no defined order in which these are called. This means that some portion of the runtime may be unavailable for use, like remoting, web services, etc., so some code may expect services to exist when in fact they are unavailable.

    So here’s my take on this: low-level code in components should not call an exit method just because it hit an unexpected condition. This is a policy decision that the application itself should make. This is especially true for applications that host multiple appdomains; it’s bad practice for a problem in appdomain1 to cause all appdomains to be unexpectedly dumped. It would be better for the application to have a method that components can call to signal this condition (fatal error) and let the application decide what it wants to do; it can still shutdown the app, but perhaps in a more orderly fashion (unload all appdomains before exiting, cleanup critical operations, etc), or it can choose to only shutdown the offending appdomain. There should always be a backstop shutdown policy; e.g. start a watchdog and force an unload if it times out.

    I completely agree that code should not run while global invariants are invalid and that a fail-safe is necessary, the question is how should a managed application 1) make the decision to unload the entire process, and 2) how the unload process be initiated.

    Here’s another problem; if any component anywhere can cause a system shutdown then how do testers go about testing all the possible permutations that can occur? Will open files be left open? Databases corrupted due to partially completed transactions? etc.

    Dave

  5. Bunch of misconceptions here:

    1. The contract for (most) native APIs in Windows is that you may not catch any exceptions which propagate through the native API call frame. Yes, some people do it and sometimes it works but often it doesn’t and when it doesn’t, it won’t get fixed. There are native APIs which have made code changes so that you can catch exceptions which pass through them but these are vastly in the minority and I won’t name them on principle.

    2. If you can’t deal with process termination, you have a bug in your code. There are many user mode conditions which will immediately and unilaterally terminate the process which you cannot easily guard against. (Easy case is touching your thread guard page without resetting it.)

    3. There is great internal debate about the utility of app domains. The debate goes something like this: "if all the code is managed and well mannered, app domains are sufficient" vs. "except for the cases in the real world where they aren’t. Being in the same process is an optimization which needs to be dialable."

    This isn’t meant to be a CLR issue. This is a generic programming one. The CLR has the notable problem that since there are no throw specifications, no reference returning members (I don’t know if they even have reference types per se in the C++ sense), no conventions around the use of a non-throwing primitive like swap() etc. it’s subject to some basic correctness limitations. But the same is true of most of these modern higher level languages which haven’t yet been used to run critical software in stressful environments.

    The whole point actually of this thread of issues is that in the move towards higher level programming we seem to have forgotten or lost some of the basics.

  6. Jay Krell says:

    Cool, CResetter, you don’t need the Disarm calls though, what you do is in the destructor you check if an exception is going by, I forget the standard library function that tells you this, and based on it you do the rollback or not (rollback if there is an exception, do nothing if there isn’t an exception).

  7. James says:

    I think you’re cheating to make this look a lot harder than it is: you’ve cleverly avoided giving us all the details, to try and lull us into thinking we’re still able to write meaningful code.

    Let’s make some reasonable assumptions and try again… after all, you were able to implement CResetter, so we know that at least some operations have safe and well-defined behaviour; besides, the problem wouldn’t be solvable otherwise.

    What’s wrong with:

    void CFoo::Bar()

    {

    CObject& rElem1 = m_array1[m_i];

    CObject& rElem2 = m_array2[m_j];

    rElem1 = rElem2 = null;

    ++m_i;

    ++m_j;

    }

    If only the subscript operation can throw and it’s a non-mutating operation, we can do them first. If the assignments can go wrong, I’d like to see your CResetter implementation… we can always use the same technique here. If integer overflow is an issue, even when the subscript operator would succeed, check for that up front.

    Presumably, these members have invariants maintained by your class, so there’s something your not telling us: is overflow a realistic case? Do we need to do anything to maintain the invariants here? Is any of the code correct if those invariants don’t hold?

    If you find some real code that’s this hard, think about the design a bit more.

  8. Integer overflow is the next round of exploitable security bugs as evidenced by some of the recent security bullitins(sp?).

    So you’re still stuck. I agree that CResetter had to make a copy of the value; maybe there’s an even more clever solution which involves a single construction of a value in the CResetter and then a Swap (and a Swap in the destructor).

    I’m actually more optimistic for C++ here than the other exception-friendly languages because C++ has (a) deterministic finalization and (b) reference types. Reference types seem to be the real key here to being able to do anything.

    One might argue that exception specifications are the final key for utility but the problem is that while necessary, they are not sufficient to demonstrate that you have done the proper level of unwinding based on all your potential error-exit paths. they tend to lead to big try/catch blocks to make the compiler quiet rather than encouraging higher quality code.

  9. James says:

    Hm, I didn’t mean to suggest that integer overflow wasn’t a bug – I’ve read a lot on that subject, including your prior blog entry and the recent MSDN article at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncode/html/secure01142004.asp (which has good content but attrocious code that will likely introduce more subtle security bugs – for example, overloading operator&& will prevent short-circuit evaluation and could be fatal if the programmer used some variant of ‘if (p && p->foo)’ involving SafeInts).

    I guess that I agree with you that programming is hard, but look at it from a different angle: if you make sure you know enough about what you’re writing before you start, it becomes a lot easier.