You Can't Fix Every Bug

Article
01/26/2005

Apoorva writes:

Handling bugs when you are relatively close to ship date? Would like to hear some "real time" stories about some difficult decisions you had to make while shipping a product and what made you take those decisions. Say, convincing dev to get a bug fixed or dev convincing you it's not worth fixing right now.... anything which is on similiar lines. The kind of trade-off's you had to make and why you made them?

Since my product hasn't announced yet I can't go much beyond generalities. The most interesting story that comes to mind actually happened to a friend of mine. This person is the best tester I have ever had the pleasure of working with. He is so good, in fact, that his developers have been known to band together to buy him lunch and thus stop him from entering bugs for a while! Picture a tester this good assigned to testing the rewrite of the entire import/export of a foreign file format. The previous version of this functionality worked OK, but it was problematic enough that combined with new roundtrip infrastructure in the host app a complete rewrite made sense.

Complete rewrites are major undertakings. This one fell behind schedule. The team tackled import first. Three developers; through the vagaries of fate my friend ended up the only tester. It took past code complete before import worked well enough that the devs could start working on export. (This project was just one component in a larger application and so had no control over the schedule. And no, the irony of not starting the second half of development until after code complete had passed wasn't lost on the team.) Export had to work, but with the limited time available there was no way it could reach the level everyone wanted it to reach. The team put out a yeoman effort but they were still forced to postpone bunches of bugs.

This process is always hard. The only way to survive is to dispassionately combine the heinousness and annoyance factor of each bug with a guess as to the likelihood that large numbers of customers will run into it and so determine its priority relative to every other bug. The riskiness of the fix is a factor too - the chances a fix will be taken drop as the amount of the app that will be affected rises. Once the bugs are prioritized you just start at the top and work your way down until it's time to ship.

The team ended up with a pretty solid feature, but plenty of bugs remained. Every bug found by a customer so far had been found by the team before shipping but was postponed in favor of fixing a worse bug.

Another example I am intimately familiar with is our automation stack. As much as I would like it to be perfect, it isn't. We have bugs in Avalon and our application to deal with. We are trying to do some rather complicated things. We are dependent on components from other teams and those components have bugs. And of course our primary job is to test our application; we're writing all this code to make that job easier but we have to balance our time between the two tasks. Complicating things even further is the never-ending list of new functionality waiting to be implemented.

Again, the only thing to do is prioritize each of these items against each other. We take a first crack at doing this at the beginning of each milestone and we continue to adjust priorities throughout the milestone. You can imagine the heated discussions that sometimes occur! It's not as simple as "feature work wins"; the whole point of our automation stack is to let us write more effective test cases and to write them faster. Every time we postpone an automation stack work item in favor of feature work we are effectively choosing to write some number of test cases now rather than being able to write some larger number of test cases later.

Clearly more test cases is better than fewer test cases. Clearly finding bugs now is better than finding bugs later. Finding the best mix of the two -- that is, deciding how much time to spend now writing tools that should help us find more bugs and write better test cases more effectively later -- is not as simple plotting the two curves and finding the point at which they intersect. That my team spends an employee on a Test Tech Lead (i.e., me <g/>) focused on building infrastructure rather than testing features is a sign of the importance we place on tools, but the tester in me is never happy with the testing we end up postponing and the architect in me is never happy with the infrastructure work we end up postponing.

Similar decisions are made by our feature teams every day. I have lots of ideas and opinions about how my feature could and should work. So does everyone else on my feature team. If we do everything we would like to do we'll never ship. Once again, the only thing to do is prioritize each item against the others and decide where to draw the line. This holds true even as ship day approaches. Certainly the criteria gets more strict, but if has to be done before shipping it will get done, regardless of whether the task is fixing a product, finishing up a test case, or enhancing a tool.

*** Comments, questions, feedback? Want a fun job on a great team? Send two coding samples and an explanation of why you chose them, and of course your resume, to me at michhu at microsoft dot com. I need a tester and my team needs program managers. Great coding skills required for all positions.

You Can't Fix Every Bug

Additional resources