It’s annoying when an application crashes, but there are worse things. In my opinion, for end-user applications (and not mission critical apps) :
– The best thing is for the program to just work as expected.
– a crash is better than data-corruption: When a program crashes, you can at least restart. Data-corruption can cause much larger loss.
– debuggable crashes are better than non-debuggable crashes. A crash that catches that occurs immediately at the point of the bug is generally pretty easy to triage. (This is what Watson does.) You’ve got a callstack pointing to a culprit red-handed. This is like catching a criminal right on the crime scene. In contrast, sometimes a bug corrupts state and the program doesn’t actually crash until much later. In this case, it may be very difficult to determine the original bug from the crash.
I derive this from my believe to optimize for simplicity. A debuggable crash is more likely to get fixed than a non-debuggable one and thus go away. (Rick plus enough Watson bugs have influenced my thinking here).
– a crash is better than a deadlock. When a deadlock occurs, you sometimes wonder if the UI is just temporarily hung and if it’s coming back. A crash doesn’t have the suspense. Also, crashes generally have a single callstack pointing to the immediate culprit. Deadlocks (especially ones that aren’t just lock based) may be harder to assign blame.
To summarize the above in list form, I’d say :
- (Best): Application works as expected.
- Mainlines scenarios work as expected. (Eg, bugs exists, but they’re in such rare corner cases, nobody really notices or cares)
- Application crashes immediately at a bug. This is generally easy to triage (and therefore hopefully fix).
- Application Deadlocks. Usually easy to triage.
- Application crashes long after the relevant bug. This is usually hard to triage and determine what the original bug was.
- (Worst): Serious data-loss or corruption
Practical design Principles?
I think there are a few practical design principles that come from this.
1) There’s a tension between #3 (crash early) and #5 (crash late). If your program detects some invariant is fatally broken, how hard should you try to recover? If you can reasonable recover and avoid the crash and get back to a sane state, then great – do that. But if you really can’t recover and are just postponing the crash, then keep it simple and crash sooner rather than later.
2) If your program operates on some large data-file, ensure that the program never puts the file into an inconsistent state in case the program crashes before it restores the file. (Outlook 2007 is really great about this. Despite all the Outlook crashes, it has never corrupted my inbox).