A new friend I got, J.D. Meier, is leading a project intended to provide renewed Architecture Guidance for the Microsoft platform, and specifically for .NET applications
J.D. was compiling input from different sources and launched a web site for his project in CodePlex (a Microsoft portal for community projects, most of them available with code, samples, screencasts and so on)
I had the privilege of being enlisted in J.D.' list an one of his inquiries was about my beliefs on exception handling. I had written a blog post three years ago about that although, ugh!, I did it in Spanish as I was living in South America at that time
I was about translating my blog post for J.D. but better than emailing my outcome, I considered worth making my ideas of public domain, so here we go:
There are four common antipatterns (those are, bad practices, bad habits) when dealing with exceptions in today's platforms like .NET and Java. Those are:
- Exaggerated "exceptionalization"
- Unnecessary intervention
- Negligent inaction
- Inopportune handling
Let's review each.
- Exaggerated "exceptionalization". I heard about several software projects where their management thought that exceptions were made to address any abnormal situation. All abnormal situations. That way, data entry modules validated received input and before the first mandatory field left blank, the first entered date field violating the "MM/DD/YYYY" mask, etc, an exception was thrown. Just because the policy was throwing an exception for each abnormal situation
That got the programming cost more expensive as all that plumbing of exception throwing & catching was developer responsibility. But that was just to start: the code became illegible as the treatment of these exceptions was several line of codes further. Occasionally, several entries in the callstack further
So we can say they got two undesired consequences: too much code and illegibility. A better approach in cases like these should have handled validations in situ, giving control back to the user when some field was entered wrongly, in a simple and direct manner (that is, without changing an error for another one)
- Unnecessary intervention. It's normal having code blocks that call modules that eventually throw exceptions. What I find anti natural -accepting that not everybody thinks as I do- is having to be prepared to react before any possible exception. I remember from my participation in a project at a multinational bank, that enclosing between a try - catch block every method call that could throw exception was imperative. And it was forbidden doing nothing inside the catch brackets. To prevent this, the project management staff asked certain development team members to build a robot that reviewed the project source code, hunting for empty catch's. After some while, the managers got finally convinced about the fact that not all the time there's something to do before an exception, but they just made their position a bit more flexible (and consequently, the robot), by allowing in such cases the chance of documenting inside the catch an explanation about why not to intervene. Since then, developers in a hurry to finish their part started entering explanations like /* dkjfhashakjbadshkjdkfjdhkj */ in empty catch's, and the robot has never found a violation of the rule again
The real thing behind this story is that, in any code segment, there will be exceptions thrown by module invocations which sometimes we could handle, so we could act in consequence, sometimes our involvement will be simply senseless. An example of this? The Data Access Layer (DAL) returns System.Data.SqlClient.SqlException because the connection string was sent with an expired password. The exception is caught in the Business Layer, the one that called the DAL. Does it make sense catching the exception in that place? Maybe to make the answer more obvious, the question could have been "can the Business Layer create a new password for the DAL?"
The right thing in that case is masking, before leaving the DAL, the low-level exception (coupled to ADO.NET in this case) as a high level one without losing the original exception as it's the one that holds the stack-trace). The original exception is usually known as root cause and logged, while the high-level exception is shown to the user. For more info, let's check next
- Negligent inaction. This is the opposite scenario regarding the previous case, also negative. It happens each time we leave go any occurred exception without taking over not even to mask certain error messages in order to avoid slapping users faces with low-level explanations like the expired password when trying to connect to the database
Eluding any intervention, as a policy, leads to code with unpredictable behavior. That erodes its reliability from a development perspective, and its usability, from the users perspective
However, how to set the correct degree of involvement without falling in the previous anti-pattern? As told before, masking the low-level exception into a higher one before leaving the layer or module where that exception is produced. Special care here, as I previously said, of original exception nesting as the new exception will be useful for presentation purposes (thinking in the user), while the original one will allow to understand what provoked it (when IT responsible personnel takes over) and the first measure in order to prevent these situations in the future, mostly if the cause is a software bug (the example of the expired database password does not belong to that category). Don't you remember having seen a web site where, after clicking in some submit button, you got a beautiful message in red characters, like [ODBC Exception, SQL code = ...]? Wouldn't have been better logging that error while communicating to the user something like "We are experiencing some difficulties. Please try again later or call to our operators at (XXX)XXX-XXXX"
- Inopportune handling. A customer of mine reclaimed about the low reliability of MS .NET regarding exception handling. He told me that they, before certain range of exceptions, had a handler invoked in the catch section. The handler generated a record in a database with the exception, with the end of taking monthly statistics on their software failures. Concretely, his recrimination towards .NET was that, while was generating the record in the database (low-level speaking, acting in a transaction-oriented manner with locking mechanisms, etc) the web user stayed waiting as a silly for an answer that, worst of worst, was abnormal
When checking deeper, we found that the catch logic was executed by design in the same thread where the exception had been occurred. Therefore, if we are urged to let the user know that we couldn't, but we also want to generate a record for stats that are reviewed in a monthly basis, the advice here is perform simple logging (a plain text file, not even XML), loading the database in offline using, for instance, ETL mechanisms
Even, if stats may be checked anytime and we want to have information updated, let's say, 10 minutes ago or even less, we can forget ETL by applying fire-and-forget mechanisms in the exception handler. For instance, the exception handler can post a message to a queue whose consumer -necessarily another thread or process- will complete the action by registering it into a database. Thus, the handler just post to the queue and returns the control to its invoking catch, which in turns will finish and return error and control to the user. The persistence logic stays, that way, decoupled and will be executed asynchronously
I want to invite anyone to share experiences to J.D., anti-patterns and their respective best practices in exception handling
PS. An interesting study on Exception Handling is found in Rod Johnson's best seller "Expert one-on-one: J2EE Design and Development" (Wrox, 2003, pages 125-132). I strongly recommend its reading