Troubleshooting Software Problems: A Scientific Approach


Years ago, when working for an Escalation Team, we decided to create a documentation to formalize the approach we use to isolate software problems, something I've been doing for years since the time I read a great book about the subject. Actually, it’s the same approach other people use in other fields. As I mentioned, I first started using a scientific approach to solve software problems after reading a great book called The McKinsey Way. This book has nothing to do with computers, but everything to do with problem solving!

 

At that time not all my peers had a structured and clear methodology to isolate software problems, thus we decided to create a framework. It was a good opportunity to share the things I learned from that book and adapted to the software world.

The most common scenario we find when isolating software problems is to have different problems causing one specific symptom. It’s important to mention because 90% of the time I’m working with my customers they expect me to show them the smoking gun.

However, during the investigation the most common or most visible problems are the first problems to be found. After fixing them we should expect:

a) The symptom does not happen anymore because all problems causing or contributing to the symptom were fixed.

b) The symptom still happens, but less often or with less intensity. It may be an indication of:

- Some of the diagnosed problems are not fixed yet.

- All identified problems were fixed, but other problems that occur less often or are less visible are causing the symptom. At this point, it should be easier to identify this kind of problem since the number of problems decreased.

c) The symptom persists like before. It means we fixed a problem that was not causing the symptom we are investigating.

Other scenarios may happen, too. However, they should be more of an exception than a rule. For example, you fixed the problem, and the symptom changed.

Just keep in mind that usually there are different software problems causing a single symptom. I’m lucky when I’m working in a support incident and, after monitoring the fix for a problem, it proves to be the only problem. This is uncommon.

With this blog post I’m not saying anything new if you already use this approach; however, if people in your organization or your customers can share a “framework” to solve problems, the communication becomes easier.

In my reports for my customers, I always describe the structured approach used to isolate problems, so they can understand why I did the things I’ve done.

That said, the “Problem Resolution Framework” presented in this post is not the result of an individual effort, but the result of a team effort.