Error handling, part 2: error types

<< Part 1 Part3 >>

Over the years I've seen various error handling conventions and systems, and even have written a couple of them myself. I've done one for Triceps, with support for C++ and Perl, and another one for C++ on Windows, that I plan to show and discuss in detail in the later installments of these articles. They have a bit different limitations but the second one is based on the concepts from the Triceps error handling, so they have a lot in common. I also plan to talk about the other ways of error handling that I've found in Windows. But first I want to talk about what a decent error handling should do, to avoid such pitfalls as shown in Part 1.

The simplest way of reporting an error is by returning an error type or an error message. The "or" is because I've said "simplest". A good way of reporting an error should contain both, and I'll explain, why.

In the plain C (or similar languages) and the OS APIs in it the error type is usually reported as an integer error code. Unix and the C standard library derived from it have errno, Windows has at least three formats for integer error codes, with two of them managing to coexist and be auto-distinguishable and one incompatible coming from the C standard library. The languages with exceptions (C++, Java, C# and .NET in general etc) represent the error type as the class of the exception, with all the fancy inheritance.

The error type really has two separate purposes:

  • Let the caller code automatically react in some sensible way to the errors it can handle.
  • Tell the human user, what happened.

 These two purposes really are almost independent, even though the namespace of the error types is common.

The automatic handling is generally about making the decision "should I try again?" or "should I try again in some different way?". The examples of "try again" errors are EAGAIN or ETIMEDOUT/WAIT_TIMEOUT, something didn't happen in time, so the program gets a chance to do some housekeeping and try again later. An example of "try again differently" is EEXIST/ERROR_FILE_EXISTS on a file or directory creation, when the program might just say "okay, no problem, I'll use the existing one instead".

The EEXIST is also an example of overlap between two purposes: some programs might decide to try again differently, some programs might just return the error to the user, and then it would be the user's responsibility to figure out, what does this error mean.

And then of course there are just the abuses of the error type to return the value from a function mixed in with the error code, such as the range starting with WAIT_OBJECT_0 in WaitForMultipleObjects(), which also happens to have overlap with the other real error codes returned by the other functions.

In the flexible interpreted languages like PowerShell or Python, there is one more use for the error type. Sometimes you want to run a command or a chunk of code and ignore the errors. So you'd write something like:

try {
my_command
} catch { }

Only then it turns out that if you've accidentally made some syntax errors or called some non-existing commands in that block that get detected only during execution, you'd catch and ignore them as well. So your code never works right, and hopefully you have a test to catch that. The right way to do it is to let the major errors through, and catch the rest. For PowerShell, the following seems to do the trick:

try {
my_command
} catch
[System.Management.Automation.ParseException],
[System.Management.Automation.CommandNotFoundException],
[System.Management.Automation.MethodInvocationException]
{
throw
} catch { }

The major errors get re-thrown, and the rest are caught. Of course, it would be nice to have one superclass for all the exceptions generated by PowerShell itself, dragging the list of 3 classes (and might be more, I might have missed something) is ugly.

The second purpose, for the human, usually requires the code to be translated to a message. Don't you hate when you just get a message that the program had experienced the error number 0x80070005? (Incidentally, the code 0x80070005 is an example of one of these complicated numeric error formats in Windows, the real error code in it is 5, and the rest seems to say that it's indeed an error and not just a message, and points to some subsystem that reported this error). The error code is still useful though. First, the messages may be translated to various languages, and the number serves to point into the translation table. Second, even after translating the code to text, including the code with the text is also useful: if you get a support request from Japan with an error message in Japanese, you can still find its meaning from the code even if you can't read Japanese.\

However for the human a message looked up by the error code doesn't replace a proper error message. A good error message must also provide the concrete details. Not just "Access denied" but also at least the name of the file to which the access is denied. And ideally also the information about the current account, and the permissions on the file. Or when you get an "invalid arguments" error to your system call, would not it be nice to also know, which exact argument is invalid in which exact way?

Some languages, like Perl, provide only the free-form text messages in their thrown errors (with die()), and there you can feel that the error messages without a code also have their limitations. The automatic handling of errors without a code presents challenges, even with such powerful string matching as Perl has. My version of error handling for Triceps also doesn't have the error codes, and shows the same limitations. However both Perl and Triceps don't generally intend their thrown errors to be automatically handled. The errors that can be handled are normally reported in the other ways, the thrown errors are normally fatal and are intended to be passed to the human user.

PowerShell is somewhere in the middle: the .NET classes in it throw the exception classes but the typical scripts would throw strings, for which PowerShell will construct a basic wrapper class. Throwing the smart exceptions from the PowerShell scripts is possible but requires too much work. I've been playing with some wrapper functions that make it easier, and I'll show them in the later installments, but I'm still working out the better ways.

So, the summary of this installment: In a good error reporting system, the errors should have both the types/codes for the automatic handling and the free-form human-readable strings with the detailed description.

<< Part 1 Part3 >>