How to Design Exception Hierarchies

I still get a lot of questions on how to design exception hierarchies, despite several attempts to describe it in talks, the FDG book, and in posts on this blog. Maybe the guidance gets lots in the in the complexities of the full guidance surrounding exception handling or I am a bad communicator. Let me assume the former :-), and so here is one more attempt at describing the guidance in the most succinct way I am capable of:

· For each error condition you reusable routine can get into, decide whether the condition is a usage error or a system error.

o A usage error is something that can be avoided by changing the code that calls your routine. For example, if a routine gets into an error state when a null is passed as one of its arguments (error condition usually represented by an ArgumentNullException), the calling code can modified by the developer to ensure that null is never passed. In other words the developer can ensure that usage errors never occur.

o A system error is something that cannot be avoided by simply writing code that tries to avoid the error condition. For example, File.Open gets into an error condition when it tries to open a file that does not exist (and it throws FileNotFoundException). This error condition cannot be fully avoided. Even if you check whether the file exists before calling File.Open, the file can get deleted or corrupted between the call to File.Exists and the call to File.Open.

· Usage errors need to be communicated to the human developer calling the routine. The exception type is not the best way to communicate errors to humans. The message string is a much better way. Therefore, for exceptions that are thrown as a result of usage errors, I would concentrate on designing a really good (that is explanatory) exception message and using one of the existing .NET Framework exception types: ArgumentNullException, ArgumentException, InvalidOperationException, NotSupportedException, etc. In other words, don’t create new exception types for usage errors because the type of the exception does not matter for usage errors. The .NET Framework already has enough types (actually too many, IMHO) to represent every usage error I can think of.

· System errors need to be further divided into two groups:

o Logical errors are system errors that can be handled programmatically. For example, if File.Open throws FileNotFoundException, the calling code can catch the exception and handle it by creating a new file. (Side note: this is in contrast to the usage error described above where you would never first pass a null argument, catch the NullArgumentException, and this time pass a non-null argument).

o System failures are system errors that cannot be handled programmatically. For example, you really cannot handle out of memory exception resulting from the JIT running out of memory.

· System failures should result in the shutdown of the process. This might sound scary at the first read, but I would say it's by-definition: a system failure is an error that can neither be handled by the developer or the program. The best way to shut down a process in such cases is to call Environment.FailFast, which logs the state of the system, and that is very useful in diagnosing the problem. The good thing is that system failures are extremely rare in reusable libraries. They are mostly caused by the execution engine.

· Logical errors are errors that can be, and often are, handled in code. The way to handle such errors is to catch the exception and execute some “compensating” logic. Whether the catch statement executes is determined by the type of the exception the catch block claims it can handle. This means that logical errors are are condition (and actually the only conditions) where the exception type matters and when you should consider creating a new exception type.

· If you think you are dealing with a logical error, I would validate this belief by actually writing code or describing very precisely what would the catch handler do when it catches the exception to allow the program to continue its execution. If you cannot describe it or the error can be avoided by changing the calling code, you are dealing with a usage error or a system failure.

· Note that logical errors are still pretty rare. As a rule of thumb, I would say that error conditions in a typical reusable library fall into: <1% of system failures, 5% logical errors, and the rest ~95% are usage errors.

· If you are convinced that your error is a logical error, you need to decide whether to reuse one of the existing exception types from the framework or create a new exception type.

o You should use an existing Framework exception is both of the following are true

§ The exception type makes sense for your error condition. For example, you don’t want to throw FileNotFoundException if your threadpool implementation runs out of the thread quota.

§ The exception will not make an error condition ambiguous. That is, the code that wants to handle the specific error will always be able to tell whether the exception was thrown because of the error or because of some other error that happens to use the same exception. For example, you don’t want to throw (reuse) FileNotFoundException from a routine that calls Framework’s file I/O APIs that can throw the same exception, unless it does not matter to the calling code whether your code or the Framework threw the exception (BTW, it’s quite often that it does not matter).

o If you decided not to reuse an exception type from the .NET Framework, you will have to create a new exception type. When you create a new exception type, as a rule of thumb, I would simply inherit it from System.Exception or from a single subtype of System.Exception representing custom exceptions declared in your component/framework. I would create the root only if you have more than 3 custom exceptions. There are limited cases where it’s good to create a more elaborate hierarchy, but it’s extremely rare. I would say in a typical library 99% of custom exceptions should follow this guidance. The reason is that when you catch exceptions you almost never care about the hierarchy. You almost never care about the hierarchy because you almost never want to catch more than one error at a time. You almost never want to handle more than one error at a time, because if the errors could be handled the same way, they should not be expressed using two different exception types. Now, please note that I said “almost” and “should” in several places. You your case falls into the small number of corner cases, you need to create a deeper hierarchy.

o Lastly, keep in mind that it’s not a breaking change to change an exception your code throws to a subtype of the exception. For example, if the first version of your library throws FooException, in any future version of your library, you can start throwing a subtype for FooException without breaking code that was compiled against the previous version of your library. This means, when in doubt, I would consider not creating new exception types till you are sure you need them. At which point, you can create a new exception type by inheriting from the currently throw type. Sometimes it might result in slightly strange exception hierarchy (for example, a custom exception inheriting from InvalidOperationException), but it’s not a big deal in comparison to the cost having unnecessary exception types in your library, which makes the library more complex, adds development cost, increases working set, etc.