Why haven’t our Exception Management Practices evolved from the 60’s?

It is amazing to think how much has changed over the last decade or so – evolution of the cloud, broad adoption of natural user interfaces and general acceptance of agile development techniques – just to name a few. But one thing that causes me no end of frustration is that during this entire period of time we do not appear to have improved our user experiences when exceptions occur. In fact I would argue that they haven’t changed significantly since abends were incorporated into the IBM OS/360 in 1964.

Before we talk about the solution let me give you some recent examples that have frustrated me. Just to ensure you don’t think I am trying to say Microsoft is any better / worse than anyone else I have cited a Microsoft example as well.

- I recently purchased an iTunes card for my dad for Christmas. We entered the card id exactly as was printed on the card but kept receiving an invalid card error message. It didn’t say why it was invalid or what we should do to resolve the problem – it just said the id was invalid. We went into the Apple store and asked the support person for help. First he tried to get us to call support instead but finally under duress he started the process of getting us a replacement card. During that time I mentioned it was urgent because dad was returning to Australia – at which point he goes “aha” – that is the problem - US cards do not work with Australian iTunes accounts! Why didn’t the error inside iTunes provide this information?

- Inside Microsoft we have deployed DirectAccess – which is arguably one of the coolest security innovations in years – allowing us to connect to the Microsoft network remotely without having to use our Smart Cards. However, it doesn’t always work. And when it doesn’t work this is the error you receive “Direct Access Connectivity is not working” – nothing to help you understand what the problem is or how to resolve it. The problem is usually caused by recent security patches not having being installed – but not always.

So how do we as an industry resolve this? In my opinion there are two changes that have to occur:

1. Add a “User Exception Testing” phase to your development lifecycle – As part of code reviews during development we should investigate every single exception that is thrown and triage the user experience as we would any other user interaction. If the user can’t diagnose the error with the information provided then raise a bug. This would have resolved both of the above issues.

 

2. Incorporate links to “Live Troubleshooting Guidance” – In some cases there are legitimate reasons why an exception occurs and occasionally it is possible that it isn’t possible to know all problems and solutions during development and so a more flexible solution is required. A good example of our Information Experience team is now resolving this problem is by allowing certain events from within the Event Viewer to link directly to the TechNet Wiki – for example Event ID 100016 redirects to here where Microsoft support and community members can update the content as new problems + solutions are identified.

I would love to hear your thoughts on what needs to occur – plus feel free to use this blog as a way to rant about any specific exceptions that are driving you nuts at the moment.