Got bugs?

“Any idea how many bugs may still be in the product?”  A question I’ve been asked and heard others get asked on many occasions.  It drives me nuts.  Not because it’s a dumb question, as we’d all certainly love to have the answer to it.  It’s just that it’s pretty much impossible to answer.  But saying it’s impossible isn’t a good answer either.  It’s pretty much a cop out.  Those that ask the question are more than aware you’re not going to have an exact number of bugs in response.  All they’re trying to get at is a feel for your confidence in the quality of the product at that specific point in time.

There are several metrics you can use to come up with a somewhat acceptable partial answers.  Things like bug trends and code coverage data come to mind.  But there’s nothing out there that gives you a ton of confidence in your answer.  And man does that suck. 

I decided to spend a few hours today looking for any information describing what those depending on software in mission-critical environments do to feel okay with their answers to this much loved question.  Google kept leading me to NASA documents which were definitely pretty interesting but still left me searching for more. It turns out that “process” is quite big at NASA and it’s far from a surprise as it’s just expected to have in environments like theirs.  They’ve written lots of documents that describe processes for processes.  They also seem to do a pretty good job at analyzing past projects and have collected some interesting data points.  Some of the most interesting I found were in this slide deck.  It’s not always clear if the source of the data is the Standish Group’s CHAOS Report or NASA’s own, but it’s interesting either way.  Here’s an example of some of the fun-facts they mention:

  • 53% of 8,380 software projects were over budget by 189%, late by 222%, and missing 39% of capabilities.  31% of the projects were cancelled and the remaining 16% were successful (e.g. in budget, on time, etc.).

There’s another slide titled “Where are software errors introduced?” which showed “requirements specification” as the answer to that question 68% of the time, “design and implementation” 23% of the time, and “installation and commissioning” with the remaining 9%.  It’s not a big surprise that specs are leading this race but it makes you wonder what can be done to improve in this area.  Obviously, writing clearer, more complete specs is what needs to happen but it’s definitely easier said than done. 

On the projects I’ve worked on, we sometimes have features that are really well spec’d out (e.g. C# language in general) and others that aren’t spec’d out nearly as well (e.g. Alink, a tool Grant has posted about a few times recently).  Even in the case where a solid spec exists for a feature, it often times isn’t a “solid spec” until way later on in the product cycle. This can lead to developers implementing a feature with a design that has many open questions and testers verifying a not-so-clear level of correctness.  It’s clear that if we got the spec right from the beginning, before developers wrote code and testers tested, we’d probably end up with less bugs, but it’s just not feasible.  Plans and feature designs change significantly over a product cycle for a variety of reasons (customer feedback, design issues, change of priorities, etc.).  We need to find ways to keep things as dynamic as they need to be but as concrete as possible throughout.  Fun.  (Going to try to stop rambling now)

Anyway, another site I ended up at was  This is a site that is incredibly rich in content related to software testing run by Brian Marick.  I’ve read several of Brian’s papers over time and been to a couple of his talks at conferences before.  He’s one of the few testing industry leaders I’ve really been able to agree with most of the time and relate to in general as he seems to speak from real experience and is able to clearly communicate it.

I read two of his “writings” today.  The first was Classic Testing Mistakes and the other one was A Manager’s Guide To Evaluating Test Suites (written with James Bach and Cem Kaner).  An interesting section of the latter was appendix A where they describe the approaches for evaluating test suites they rejected.  The approaches rejected were error seeding and mutation testing which can somewhat be used to predict future defects in a program.  Both of these involve explicitly adding bugs to your code to get a feel for what percentage of bugs your tests are catching and missing.  Clearly this approach has several problems, which they do a good job of pointing out, but oh how I wish something along these lines could really work.  I might actually play around with something in these areas if I come up with anything that’s even remotely promising.

Anyone have any thoughts/ideas on ways that can lead to answers to my favorite question?

Comments (13)

  1. says:

    Do you really want to know why most bugs are truly missed… if so then please read on.

    Well most of the bugs (and yes it is a fair enough assumption) because of the Database itself. Customer/Testers will not always know what to call each bug- word usage is what it boils down to.

    Example- Video driver testing.(a quick way to get my point acrost and is ficticious)


    These drivers cause display corruption.


    There are crazy characters acrost my screen.


    I can hardly see anything that should be displayed on my screen.

    All of these examples are of the same issue.

    How does the database know these are the same issue? So it reports it as- you guessed it 3 different issues(bugs).

    Though yes I do know that steps are taken with windows errors to try and at least identify the file that caused the error. and put it in its bin so to speak.

    But what of errors that do not cause errors- sound funny- for example Web Page issues- as far as the browser is concered it displayed it correctly. But what if you coded the page to display pink with blue text but the customers computer due to an update that changed the browser- it displays blue.

    The customer reports this- I cannot read your web page at all. (blue text on a blue page)

    you go back to view your site and you see pink with blue text- (case resolved) then your site fails to get hits and buisiness is lost.

    Sometimes you need to mine the data a little further and allow other bugs to be grouped with that one.

    That is just my thought on why some bugs get missed even after thourogh testing.

  2. Alex Barnett says:

    Great post Gus.

    In my experience, it is usually poor requirements gathering & speccing that leads to a mess later on. Get this right and the chances of a relatively trouble-free dev and test phase increase dramatically (as the numbers in your post suggest).

    To Redvamp’s point: The role of removing the semantic ambiguity of reported bugs as well as their categorisation – especially when you’re involving end users – should be done by a QA/UAT manager – a clearly defined responsibility and role within the team and engineering process. If a bug can’t be clearly categorised, or is vague and non-specific, they should go to the source if possible and clarify….The scenario you describe above really shouldn’t if this resource in in place.

  3. Gus says:

    Redvamp, we get a significant amount of bugs logged against us that are duplicates. These bugs are practically never just dismissed and ignored, in fact, we’re probably spending more time investigating "dupes" and "no repro" bugs than we probably should in some ways. When we look back at bug stats over time, because of how we resolve them (if the same bug was logged five times, only one of them is typically resolved as "fixed") we have a pretty accurate list that separates bugs from dupes/no repro/by design bugs reasonably well. Thanks for the feedback!

    Alex, thanks. And yes, categorizing incoming bugs is a team effort (not just the QA side, if anything, dev leads end up doing this the most) and we definitely send bugs back to the source for more info whenever necessary. Thanks!

  4. Requirements is NOT a phase.

    Hell, nothing in development is "a phase". Not as in the phase parents talk about when their teen picks up some crazy idea and runs with it (becoming a vegetarian, dying their hair bright pink, etc.. ), you know – "it’s just a phase."

    The requirements "phase" is complete when the system is built. Only then do all people involved in the project have the same unambiguous model of the requirements (sometimes not even then!)

    It all comes back to the saying:

    "Build the system right. Build the right system."

    Well, how do you know when you have ? Is zero bugs the answer ? Does it even exist ?

  5. Gus Perez says:

    Udi, I agree in some ways but not in others. I don’t believe in strict independent phases in a software project, but we sure do have phases (planning, coding, stabilization, etc.) though they do bleed into each other and overlap a whole lot at times. I do believe though that mission critical projects (like many of those I was reading about yesterday from NASA) do have, or at least try a lot harder, to have a real requirements phase, development phase, etc.

    > It all comes back to the saying:

    > "Build the system right. Build the

    > right system."

    Agreed. But in some cases you have to prove you _can_ build the right system. For example, if you’re convinced a certain feature should make it in the product but it’s late and you go to upper mgmt to convince them and all you can say is that you think you’re building the right system, it won’t always go your way. They might agree with you on adding the feature but might not be able to fully grasp the risk of committing. The more data you can give them the better. That’s the kind of thing I’m after.

    > Well, how do you know when you have ?

    > Is zero bugs the answer ? Does it

    > even exist ?

    There’s definitely no such thing as zero bugs. There’s always more bugs. I’ve definitely learned that. But I’m just looking for more clarity and ways to get more confidence and not have to depend on my "gut feel" as much.

    Thanks for the comments Udi.

  6. On a good note I am glad to see that Microsoft once a bug has been found and a fix has been known or known to be a possibilty there is a link to where the person can obtain a fix.

    That is a big step in eliminating alot of the unneded and erouneous bugs.

    Would anyone know a good link to a testing method on determining disapearing memory? Memory Leaks>?

    Thank you for taking the time to reply to my post. And yes I do know your job is hard tracking down bugs.

    Too many things can lead up to certain things going wrong with programs.

    Viruses-Other Running Programs-Improper Drivers- Order to which things have been installed and unistalled- Changes programs make to system files in order for thier program to run properly- Also sometimes it falls under the category of some unknown force.-Many Hardware Configurations.

    Though I try when I report an error to give full details – sometimes overkill… Too many reports I have read about people and thier issues with programs. mainly on forums the user is too vauge on what is going on. So that would also hinder the finding of bugs. The only true answer is that over time the bugs surface and are fixed. (though that process of fixing one thing can lead to more bugs) So Bugs become a neverending process.

    Would it not be nice if everyone had the same computer configuration. That would truly make it so the perfect Operating System and Program Can be made without flaws. 🙂

  7. Juan J. Perez says:

    I love this problem! To figure out how many "problems", "defects", "deviation from requirements" (or whatever terminology you prefer to use) exist in a "software" system is almost like asking "what is success?", but not really. Software, a bunch of instructions that run on a system that is (somewhat) deterministic at it’s electronic layer (i’m not going to get into the physics of inductors and capacitors or the lower levels, that’s a completely different thread :)) but yet so complex with so many uninteded results (bugs). We’ve managed to create so many abstraction layers that help us constrain a problem to it’s core concepts (and disregard the underneath abstractions) and hurt us at the same time. Why does it hurt us? Are all layers in the software stack considering every other layer above and below at any time t? and at t+n? Absolutely not! When you design a ‘flexible’ and powerful software system, you are pretty much guaranteed to not consider every scenario that an user can implement. Multiply that 5,6, or 50 times and you have the perfect peatry dish for the software complexity that we call "buggy software". I like to refer to it as "organic software". Organic software is software that lives and evolves. It starts out being simple (at it’s conception during ideation) and grows in complexity either to death or survival depending on the simple decisions made by the software’s contributors.

    If a developer (or anyone who contributes to a piece of the software be it specs, code, anything) introduces a problem that it’s end users find so annoying that they decide not to continue using or recommending the software to others, the software will likely die quickly. If lots of people find enough value in the software and the software is "reliable enough" then the software lives on and may even grow to significant user adoptions.

    The question then is what is "reliable enough" software? That is dependent on the end users expectations and thresholds (regardless of how many end users exist). In the NASA case, there are lots of end users – astronauts, NASA engineers, government officials, citizens, Gus’ Pug :), etc… It is not ok when people die and therefore NASA works really hard to create processes for processes (recursively about 10 times :)).

    So how do we get to answer this question: "how many bugs are there left in the product?" It’s a 100% statistical answer depending on thousands of factors.

    So how do we solve the statistical challenge? We play the complexity game and distribute the problem with some simple rules.

    Microsoft OCA (Watson) has an interesting approach by catching global exceptions…

    Why is this a hard challenge? Fixing the problem is as hard as (or maybe harder than) making the software.

    The good news is that software has changed the way we think, solve problems, communicate and on and on… Even more good news: we definitely have some serious work to do!


    Great thread Gus!

    The comments on this post are opinion of Juan J. Perez and not of Microsoft Corp.

  8. Paul Dietz says:

    I don’t think the question ‘how many bugs are left?’ is very well-posed, even if you know exactly what the software is supposed to do.

    A more answerable question would be: how reliable is the software in the hands of the users? This is related to the presence of bugs, but isn’t the same, since different bugs will have different customer impacts.

    The way this has traditionally been measured is by testing the software with tests drawn from a random distribution based on an operational profile that reflects how users will be using the software. If the profile is an accurate model of user behavior, the rate of failures in testing will reflect the rate of failures in the field.

    Getting that operational profile is the problem, though, and it’s going to vary from user to user. You may want to instrument the software so that when it’s being used in the field, it collects usage statistics that can be fed back into your operational profile. This is different from having the software send in bug reports — it’s reporting on user actions even in the absence of failures.