I wish I had written these…

From Eric Sink, via Dana Epp:

"My Life as a Code Economist"

And Dana's original article:

"The cost of fixing bugs..."

Comments (25)

  1. till_pero says:

    Basically shipping products with bugs (known or unknown) is lame – it’s tolerated only because otherwise, it would lead to cost explosion – it is already very costly to buy and use any software with even a small level of complexity.

    That does not of course mean that shipping *bug free* software is impossible – that’s an huge insult of Engineering profession. (Imagine Aeroplanes, Nuclear Power plants etc. built and commissioned with bugs, even consider CPUs – for the complexity, they have far less number of bugs than any operating system.)

    Software Engineering could benefit hugely from having completely effective processes, intelligent tools and engineers who religiously follow them. Any one who ships buggy products is missing at least one of the above.

  2. Marvin says:


    The cost of a fix that these articles talk about directly depends on how much the code is garbage to begin with. It is never zero but it is far greater for MS than it should be. Instead of committing 100% of your resources to pushing the next big wave of unreliable crap (aka .NET) concentrate on techniques to write robust code. This will decrease your bug fix times and costs and change the balance in the equation. Note that "robust" is more than "secure". The security scrub you went through is just the tip of the iceberg.

  3. Norman Diamond says:

    Economics? I’ve had enough previous employers where the cost of fixing a bug was blamed on the person who reported the bug. After seeing a posting by one of your colleagues who was penalized for helping to improve bug reporting, plus a few comments on MiniMsft’s site, it sort of looks like Microsoft is par for that course.

    There are more than two groups of software engineers. Some have different opinions than others about what kind of bug should be considered a show-stopper. Some have different opinions than others about whether warranty service should be provided.

  4. * full disclosure: I am the moderator of webappsec@securityfocus.com

    I find this discussion amusing as it is more than 10 years old, and still hasn’t really been put to bed yet.

    "Responsible disclosure" requires "responsible software vendors".

    When vendors do not fix their software, we get people pushing out exploits prior to fixes being available – a lose-lose situation.

    When vendors exploit a disclosure policy to put off doing the right thing, we all lose. Back in the day, SGI used to be the boogyman. CERT would sit on vulnerabilities literally for years until the vendor notified CERT that they had a fix and had spread it around their customers. SGI exploited CERT’s policy. full-disclosure@… was born out of that *failed* "responsible disclosure" policy.

    By common assent, bugtraq mostly adheres to RFP’s disclosure policy to *encourage* vendors to fix and test in a reasonable time frame.

    However, I know how hard moderating a high volume list can be as I look after webappsec. I do not let 0days out on my list, but I allow discussion which may lead to new attack types. This is responsible as if the security professionals are thinking about it, you can bet your bottom dollar that attackers are thinking about it, or already have.

    I strongly advocate development of preventative measures along with pure attack paths. Without preventative measures, we will just become more and more vulnerable as time goes on. That’s why I re-wrote the OWASP Guide 2.0 to be a positive book, and minimized the total number of negative "thou shalt not" entries.

    However, getting back to the original topic. In 2002, I presented a talk to linux.conf.au stating unequivocally that if you use dangerous languages like C or C++, you are responsible for not releasing code which has buffer overflows and other preventable flaws. Most user land utilities and applications could be recoded with some effort in C#, which is for the most part unaffected by buffer overflows.

    Dana has a point, but it’s a very naive point of view. We need to have *some* pressure on vendors as they have shown to be less than forthcoming in the past (SGI, Cisco), otherwise we all lose. But we also need to allow security researchers to freely communicate their findings and research so that we can develop techniques to prevent security flaws.


  5. till_pero, clearly you’ve never worked as a professional software developer (or at least not on any project of reasonable complexity).

    There are NO projects that ship without bugs. None. Even mission critical ones like the projects that run airplanes, nuclear power plants and space ships. Those applications have a very low tolerance for bugs, and have a massive amount of overhead associated with ensuring that the project have as few defects as possible, but they still ship with bugs.

  6. Dustin Long says:

    The problem is that most bugs can’t be foreseen. Who knows how a system is going to act when it’s living in the wild, when it’s performing with input that it wasn’t tested against, when the components are grinding against each other in ways they never have before. Until we come up with a way to test against all possible input cases imaginable (ha!), or adopt formal code proofs as the norm (double ha!), software with bugs will ship.

    "Program testing can be used to show the presence of bugs, but never to show their absence." – Dijkstr

  7. NoDeadCode says:

    "…how much the code is garbage".

    No, it is how complex the code is already.

    Nobody with a sane mind *WANTS* to ship a product with bugs. The visibility is more in systems that exposes an API to an external team or a component.

  8. People who compare engineering to software engineering and claim that since bridges don’t fall down on a regular basis software should be flawless bother me.

    Tell you what, when you’re told to design a car that one month later has to be able to go under water, then 6 months later, right before it’s due to ship, also has to be able to go into outer space, then we’ll talk.

    Engineering has a set number of variables whereas software has a (practically) unlimited number of code paths.

    Given a limited amount of resources one has to determine where efforts should be directed to.

  9. till_pero says:

    Larry – makes me wonder if you read my post at all. Ok, let’s get this straight – it’s this attitude of MS that has caused it to ship products with *nasty* bugs that explode and cause you *loss*.

    I call it a bug when a *normal* use of the product causes loss of _some_ kind for which there is no work around.

    (Mind you Windows has plenty of such bugs)

    Going by your negligent statement, we must have seen loads of air craft crashes every day. Aircraft software does NOT have bugs that occur with normal use of the aircraft and heck may be even abnormal one. PERIOD. Mission critical s/w that ships with bugs either ships with enough prevention mechanism to avoid the circumstances causing the bugs – meaning it no longer is a bug but reduced functionality which is acceptable.

    I don’t recall nuclear power plant explosions caused by bugs – it performs exactly per specification.

  10. till_pero says:

    Those applications have a very low tolerance for bugs, and have a massive amount of overhead associated with ensuring that the project have as few defects as possible, but they still ship with bugs.

    In engineering there is always an tolerance it’s either very low or very high or somewhere in between. We are discussing of bugs – which are caused by non/mal-function of the software which in turn is caused by Higher tolerance.

    Saying low tolerance is still tolerance and is comparable to high tolerance is funny. Latter causes you loss and classifies as ‘does not work per specification’ == BUG, but the former, despite of being a little off than specification, doesn’t cause you any loss – it’s acceptable.

    Now give me an example – what bug(s) a mission critical s/w shipped with and if it did why was it call ‘mission critical’ at all. I tell you no real mission critical s/w did – all of them performed ‘well within specification’ under all practical circumstances. That is NO bug at all.

    Makes me wonder however, if Boeing sells Aeroplans with the kind of NO WARRANTY OK ANY KIND clauses that MS ships their OSes with!

  11. vince says:

    > ill_pero, clearly you’ve never

    > worked as a professional software

    > developer (or at least not on any

    > project of reasonable complexity).

    So you’re a "professional software engineer"? Does that mean you’ve taken an exam provided by the state and are legally liable for any damages caused by errors in your code?

    > There are NO projects that ship

    > without bugs. None. Even mission

    > critical ones like the projects

    > that run airplanes, nuclear power

    > plants and space ships.

    Yes, but if an airplane or nuclear plant somehow has an error that causes millions of dollars in damages, the company has to pay. So they are very careful to avoid such things.

    Microsoft, on the other hand, can cause millions of dollars of lost time and money (see the spyware problem and virus problem) and for reasons I don’t understand no one ever sues.

    It’s not as if MS doesn’t bring in enough profits to fix most of their critical bugs if then wanted to.

  12. Mike says:


    there’s a difference between:

    – shipping products with bugs.


    – shipping products with *known* bugs.

    It is possible to ship even large systems without known bugs. Im not talking about the kind of bugs you get due to a defective (or even non-existant) QA, but the kind of bugs that slip by even extensive (but provably not extensive enough) QA.

    There are many reasons products ship with known bugs, but usually it boils down to engineers and developers being overruled by the ones wanting to make a quick buck. But whatever the reason, it’s always a deviation from good business ethics – at best.

  13. Yes, the POC should have been sent to Microsoft privately before (shall we say, MONTHS before) being released into the wild.

    My question, though, is this…

    How much effort does Microsoft make to investigate whether known bugs are more serious than they at first appear?

    Clearly, not enough.

    If someone points out that there’s a crack in the dam, then calls the person who built the dam and says "look at the crack, it’s leaking water", whose responsibility is it to check whether that crack is a minor inconvenience (leaking water) or a sign of major structural damage?

    The person who noticed the crack, or the person who built the dam?

    It AMAZES me that this person was able to discover a working exploit, even without access to the code.

    Microsoft should have been able to assess the risk of a possible exploit much more easily. Not only does Microsoft have access to the code, but they don’t even have to make the exploit work… they just have to think "hey look, there’s a potential buffer overrun here."

    Easy fix.

    Food for thought:


    "… if a man builds a house badly, and it falls and kills the owner, the builder is to be slain…"

    St. Francis Dam – worst civil engineering disaster in US history


    "… On the morning of March 12, Harnischfeger discovered a new leak and worried that it was undermining the dam. Mulholland, his son Perry, and assistant Harvey van Norman investigated. Perry thought the leak looked serious, but Mulholland felt it to be typical of concrete dams, and declared the leaks safe.

    The dam finally crumbled at 11:57 p.m. on March 12, 1928, scarcely 12 hours after Mulholland had inspected it.


  14. Maurits, that’s a good point. Personally, I classify this in the same category as off-by-one overruns and heap overruns.

    Two years ago, if you’d asked a security professional, they would have told you that a one byte overrun of the stack wasn’t exploitable. Then the hackers showed how it was exploitable. Similarly with heap overruns.

    Our knowledge of what is exploitable and what is not exploitable constantly changes over time. This is why Microsoft requires that developers take ANNUAL security training – because the exploit landscape constantly changes.

    From what little I’ve seen about this particular problem, it appears that something like that happened here – a condition that was previously thought to be unexploitable was shown to be exploitable.

  15. Norman Diamond says:

    Monday, November 28, 2005 11:10 AM by vince

    > and for reasons I don’t understand no one

    > ever sues.

    Legal proof is far far harder than engineering proof. Even when you can accomplish legal proof, it’s rare for awards to even cover the expenses. The outcome shown in "A Civil Action" is more common than "Erin Brockovich", but more common than both is for the case to just be abandoned because victims can’t afford to produce legal proof.

    Monday, November 28, 2005 5:52 PM by Maurits

    > http://www.fordham.edu/halsall/ancient/hamcode.html

    > "… if a man builds a house badly, and it

    > falls and kills the owner, the builder is to

    > be slain…"

    Yeah, but if the builder can afford better lawyers, and if the builder isn’t stupid enough to do something like telling the truth in court, then how are you going to prove it…

  16. ilja says:

    > Two years ago, if you’d asked a security

    > professional, they would have told you that a one

    > byte overrun of the stack wasn’t exploitable. Then

    > the hackers showed how it was exploitable. Similarly

    > with heap overruns.

    Hm, it’s a bit longer then 2 years tho, it’s 6:


    The timeline surrounding new bugclasses (or partial new bugclasses) is interesting tho. Usually they are known for a very long time to a select group of people and it’s not untill someone raises a big stink about it that other people know or care about it.

    A good example is formatstring bugs, they’ve been documented since adleast 1988 (The C programming language 2nd edition). Very few people knew about it untill 2000 when all hell broke lose and everything turned out to be vulnerable to it.

    Whenever the public at large is introduced to a new bugclass it’s always interesting to go look if you can find earlier references to it. You usually can! it’s just that no one seemed to care about it back then.

    The thing is most bugs have the potential to be security bugs when placed in the right context. It’s just expensive to think about a bug (that’s not yet seen as a security bug) for a while and try to come up with a security sensitive context for it.

    I agree with larry that constant dev. education regarding security is needed. He’s absolutely right, the exploit landscape is constantly changing and it’s changing more rapidly then it did a decade ago !

  17. James Risto says:

    Sometimes I wonder if people just need a target to rant against! Seriously … relax, people. Speaking from 20 years in IT, MS is by no means the worst bug shipper/admit-er/fix-er. All other vendor provide shoddy support from time to time … in my opinion the cumulative worst is another major sw/hw/services vendor.

  18. Feroze Daud says:


    Building software is not easy. It is a complex task. There will always be bugs. You say it is lame to ship with bugs (known and unknown). First, if it is a unknown bug, you didnt know it existed when you shipped the product, so there is nothing that could have been done about this in hindsight. As regards known bugs – again, it depends on the severity/priority of the bug. If you take the position that you will never ship software with a known bug, then very few projects will get to customers. The reason? Fixing bugs sometimes introduces more bugs. Also, QA tests are sometimes not complete, so they dont catch all the bugs in software. Also, you get to the point of diminishing returns after some point. Do you want to hold up a product ship just because the "OK" button in a dialog box is off by one pixel?

    Let us talk about mission critical software. I dont have the link offhand, but if you go to slashdot, you can search for one of the posting which listed 10 top bugs of all time. There have been incidents where people were injected with the wrong dose by a medial device because of a bug in the software. NASA lost the Mars Observer, because one contractor was working in metric units while the other was working with inches. It was a bug in the process!

    Having said that – is there room for improvement? Absolutely – and we are doing it everyday. Sometimes we are successful and sometimes not, but I can say with confidence atleast about my team that we are focused on delivery quality code.

  19. Feroze Daud says:

    I wanted to add to my post on this subject. Another bug that hit a mission critical piece of software was with the first Mars Rover. They had a bug in the software that did the scheduling, and it caused the Rover to malfunction. It was a very subtle bug, I grant you that, and they could fix it remotely as well. Anyway, the reason I brought this up was to give another example of a bug that hit a critical piece of software.

    I would also like to add that sometimes mistakes do get made. People make the wrong judgement about a bug and dont realize the impact it might have down the road. So, yes, sometimes software does get shipped with "known" bugs. However cases like this should not be very many. No project manager, whether at MS or elsewhere wants to ship software that will have bugs in mainline customer scenarios. And where this happened, managers are made accountable. Now, again, as I said previously, there are definitely instances where this process does not work perfectly. However there shouldnt be too many of these instances.

    Bottom line is – there are other alternatives now. We know that if we dont do a good job and ship shitty bug ridden software, then customers will vote with their dollars and go elsewhere.

  20. Chris Moorhouse says:

    Interesting… we keep coming back to market impact as the driver for bug fixes. Many businesses do not have this vaunted ability to "vote with their dollars". Consider that there is a cost attached to system migration, and that such a cost might be so high for technically-reliant businesses that they will surely go under if the migration is undertaken. A business in such a position will stick with the buggy system they have and pray.

    I see these "economics of software design" themes fairly frequently in the MS blogs, and why not? That is the the model that MS approaches business with. Eric Lippert’s "How many Microsoft employees does it take to change a lightbulb?" is a somewhat more humourous article in a similar vein. Unfortunately, I see a few people missing from Eric’s list, like the number of marketing people involved in promoting said new feature, or the number of people making said feature accessible for his blind Catalan-speaking Spaniard, who happens to be a deaf quadriplegic as well.

    All this brings up a point that occurred to me: what if all that effort devoted to market-broadening localization, accessibility, and marketing were put into software development instead? As someone once mentioned, "Fewer clients, less money". Not that the disabled or those who don’t speak English don’t deserve great software, it’s just that they (and everyone else) don’t deserve to have to wonder if their computer will be working tomorrow.

    Of course, if Microsoft did anything that increased quality at any cost to their customer base or advertising juggernaut (and hence to their bottom line), some other voracious being would just step in and start it all over again. These tradeoffs are made not to stay in business, but to remain on the top.

  21. Dana Epp says:


    As some of my comments on my blog point out in the original article, I agree that we need to place pressure on the vendors from time to time. My point though is that pressure needs to be applied through a responsible workflow. If security researchers really wish to protect the safety and security of their clients while elevating their own credibility in the industry, they must follow responsible disclosure practices.

    Researchers have every right to be able to disclose their findings. The balance is doing so while respecting the well-being of the rest of the Internet. This wasn’t the case. They didn’t even make an effort to notify MIcrosoft beforehand.

    And I am not absolving Microsoft from responsibility here. They have a TERRIBLE track record when it comes to responding to some threats (see EEyes Upcoming Advisories on just a few examples of vulnerabilities going on over 200 days now). But its hard to work on those when they have to respond to new attack patterns that are in the wild.

    Further to this, Microsoft showed their human frailty in their security response practices with this incident. During triage of the original bug a threat model would have been performed and it is apparent this attack vector wasn’t even considered. And it should have been. But now its a moot point. Now they are in defensive response mode in an effort to protect all their clients.

    How does the irresponsible disclosure benefit us as the user? It doesn’t. It actually put us all at MORE risk. And that’s not acceptable.

    Naive? Perhaps. But that’s because I believe in the disclosure process. It requires both sides to work. When either side collapses, the whole thing is shot. This incident is proof of that.

Skip to main content