Why building software isn’t like building bridges

I was having a conversation with a friend the other night and we came across the age-old “software should be like building buildings” argument.  It goes something like this:  Software should be more like other forms of engineering like bridges or buildings.  Those, it is argued, are more mature engineering practices.  If software engineering were more like them, programs would be more stable and projects would come in more on time.  This analogy is flawed.

Before I begin, I must state that I’ve never engineered buildings or bridges before.  I’m sure I’ll make some statements that are incorrect.  Feel free to tell me so in the comments section.

First, making software, at least systems software, is nothing like making buildings.  Engineering a bridge does not involve reinventing the wheel each time.  While there may be some new usage of old principles, there isn’t a lot of research involved.  The problem space is well understood and the solutions are usually already known.  On the other hand, software engineering, by its very nature, is new every time.  If I want two bridges, I need to engineer and build two bridges.  If I want two copies of Windows XP, I only engineer and build it once.  I can then make infinite perfect copies.  Because of this software engineering is more R&D than traditional engineering.  Research is expected to have false starts, to fail and backtrack.  Research cannot be put on a strict time-line.  We cannot know for certain that we’ll find the cure for cancer by March 18, 2005.

Second, the fault tolerances for buildings are higher than for software.  More often than not, placing one rivet or one brick a fraction off won’t cause the building to collapse.  On the other hand, a buffer overflow of even a single byte could allow for a system to be exploited.  Buildings are not built flawlessly.  Not even small ones.  I have a friend who has a large brick fireplace inside their room rather than outside the house because the builders were wrong when they built it.  In large buildings, there are often lots of small things wrong.  Wall panels don’t line up perfectly and are patched over, walls are not square to each other, etc.  These are acceptable problems.  Software is expected to be perfect.  In software, small errors are magnified.  It only takes one null pointer to crash a program or a small memory leak to bring a system to its knees.  In building skyscrapers, small errors are painted over.

Third, software engineering is incredibly complex—even compared to building bridges and skyscrapers.  The Linux kernel alone has 5.7 million lines of code.  Windows 98 had 18 million lines of code.  Windows XP reportedly has 40 million lines of code.  By contrast, the Chrysler building has 391,881 rivets and 3.8 million bricks.

Finally, it is a myth that bridge and building engineering projects come in on time. One has to look no further than Boston’s [thanks Mike] Big Dig project to see that.  Software development often takes longer and costs more than expected.  This is not a desirable situation and we, as software engineers, should do what we can to improve our track record.  The point is that we are not unique in this failing.

It is incorrect to compare software development to bridge building.  Bridge building is not as perfect as software engineers like to think it is and software development is not as simple as we might want it to be.  This isn’t to excuse the failings of software projects.  We can and must explore new approaches like unit tests, code reviews, threat models, and scrum (to name a few).  It is to say that we shouldn’t ever expect predictability from what is essentially an R&D process.  Software development is always doing that which has not been done before.  As such, it probably will never reliably be delivered on time, on budget, and defect free.  We must improve where we can but hold the bar at a realistic level so we know when we’ve succeeded.

Comments (23)

  1. Mike says:

    I actually agree with the meat of your post 100%, and don’t care to argue those points at all. I would however like to point out that the Big Dig fiasco took place in the jolly old city of Boston, rather than in NJ.

  2. David Smith says:

    But it’s easy to see why confusion might arise to an un-experienced person.

    Often software serves to function in the same way as a bridge or building, and so a metaphor may be thought of. Expanding this metaphor to places it doesn’t belong is the problem.

  3. Steve Rowe says:

    Thanks for the correction Mike. I’ve made a change in the post to give credit to Boston for the Big Dig.

  4. mike says:

    Some new wheels were invented in the construction of the Millau Bridge. http://www.technologystudent.com/struct1/millau1.htm

  5. I think the gap between software engineering and traditional engineering can be bridged by acknowledging the essential difference between "R" and "D," and trying hard not to ship "R" material before it has gone through a proper cycle or two of "D."

    You put your finger on a key source of the problem: the amount of uncertainty and novelty inherent in virtually all "interesting" and "ambitious" software projects, especially for system software.

    To the extent that ANY project is large, ambitious, and fraught with novelty and uncertainty, it is also fraught with risk. Among the key goals of traditional engineering are to understand, assess, characterize, constrain, and minimize risk. You can’t do that effectively, unless you know what you are doing, including a thorough understanding of your materials and methods. That isn’t, however, the nature of at least the R part of R&D. In the R phase, you are in discovery mode. It may be a long time before you have anything that works, much less know anything about it, and even longer before you can say with any certainty what you have and what you know. Still, it’s only at this latter point of knowledge that you can responsibly toss the products of your work over the transom into the "D" room.

    In "D," the assumption is that at least the basic principles of what is being done are well understood, and that there is also some good experience with several relevant materials and methods. "D" processes may synthesize from the less-refined ingredients that are produced by the "R" stages, but in the end, what "D" processes yield are reasonably well-measured, tested, and characterized packages of materials and methods, which other people (probably application developers, if we’re talking about system software) can proceed to use as "safe" ingredients or approaches in their own projects.

    The point is to filter out the risks by adding information that reduces uncertainty at each successive stage of refinement. Further, to avoid making commitments about resources, costs, deadlines, features, or other potentially embarrassing details, until your uncertainty about such things has fallen below an acceptable level. You are right that nobody can guarantee a cure for cancer by March 18th — that is, unless it has already been done and/or you alreay have it in hand.

    There, I think, is the key to becoming more realistic about the delivery, cost, and reliability of software. Quit gambling so much. Don’t commit to resources or a schedule unless and until 1) someone else has done the same thing before and you have the documentation of the experience; or, 2) you have a working prototype (not just a "proof of concept") or even a "kit" of all of the necessary components in hand.

    What I think the above implies, is constant invention within "R," in response to new technology opportunities, great ideas that pop up, or articulated market needs — usually combinations of all three. Anything that gets to the point of working (operating prototype) needs to be studied and characterized, in terms of its theoretical limits, spectrum of applications, cost/benefit vs. competing approaches, etc. Perhaps the "R" group produces a "developer’s workbench" for the technology. Only then does it become OK to use it in first-level "D" projects, which aim at creating well-tested and characterized, practical, component packages out of the "R" items, which integrate well into the ongoing production architectures. In second-level "D" projects, the packages produced by first level "D" are assembled and polished into systems and applications that developer customers and end users can employ.

    By the time you get to 2nd-level "D" projects, most of the uncertainty about what you have and how efficiently or reliably it works is gone. (Along with much of the excitement, many developers would argue. 🙂 At that stage, it becomes fairly straightforward to commit to a project and to hit your delivery, performance, reliability and cost targets, because so much is already known about the ingredients you’ll be using and the methods you’ll use to put them together. Making commitments at the 1st-level "D" stage is more risky, because the great demo workbench that comes out of "R" may or may not be helpful in producing a reliable, characterized, practical software package (which can function well in production code) with X resources by Y date. Making product-delivery commitments at the R stage is foolhardy or desperate, because of the uncertainty and novelty that you rightly cite as a key aspect of "R" work.

    After having watched numerous application and systems-level software projects, large and small, from the inside and the outside, I think that software development does not become "engineering" until at least the 1st level of "D," yet it still seems all too common in the industry, for schedules and resources to be negotiated — and high expectations of success to be set — while most of a project’s key components are still deep in the "R" stage. I’ve also observed that many of those resource agreements, and many negotiated schedules, guarantee that a product must be declared final and shipped while it is still in an "R" stage, or an early "D" stage at best.

    The problem is exacerbated in the case of an operating system, as it necessarily embodies new approaches, architectures, and components. Yet even in that case, you can do a fair amount of the "prep cooking" before having to commit to a time when the soup will be done. As it is, software of this class often must be released into the world (for beta tests or even general availability) before the real "engineering" begins. That’s like expecting thousands of people to put their lives on the line to travel across a prototype bridge, with the engineers watching like hawks and ready to spring into action, to address any defects that may become apparent in actual usage, before they become injurious or fatal. That M.O. was barely tolerable for NASA’s manned missions. I don’t think it’s tolerable or sustainable for infrastructure software upon which millions depend. So I hope that you folks get a handle on the uncertainty factor one way or another, sooner rather than later. Your customers and your investors will thank you. You might even start to earn some real respect from traditional engineers.

  6. Drew says:

    I agree that, for the most part, making software is not engineering. I’ve been struggling with the pros/cons of that lately. Ok – maybe I’ve just been inspired by the sudden wealth of new "process" we have for Longhorn. Or maybe inspired a bit by my father, who is an actual engineer and likes to poke fun at my job title.

    (IMHO) The real reason is because engineering involves building things whereas making software consists mostly of creating bad analogies. Watch me try my hand at it, proving I’m not an engineer. 😉

    The reason working on Windows isn’t really engineering (despite our job titles) is that people’s lives don’t depend on it. Why do those civil engineers building the bridges tend to use proven techniques/materials/etc.? Why do they have professional licensure? Because bridges can kill people. If (when?) Windows runs on a kidney dialysis machine, there will be a need for real engineers and not just engineers in name. Like those NASA folks James mentioned, for example.

    I’m not buying the "fault tolerances different" argument, either. Software can have bugs and still work. There can even be buffer overruns (ghasp!). Sad as it may seem, people don’t expect software to be perfect. Have any idea how commonly people reboot to solve their problem on a home computer? They probably complain about it in something like the way I complain about the poor design of the paper towel dispenser in the men’s room at work. I try to pull one towel out and half the time the whole stack falls out and … you get the idea. If paper towel dispenser doesn’t work as something analagous to bug for you I can similarly rant about how there’s always at least one elevator broken in one building I go to or how the exit door nearest my office always seems to stick. My architectural UX sucks. Minor hassles. I can’t fix them, so I just accept it and move on.

    Note: Those flaws in the building were not dire problems in the building’s core architecture. If that were horribly flawed the building would be condemned. People’s lives again. Not like software, really. I’d like to formally apologize to that tortured analogy.

    Also note: I’m not saying any of this is an excuse for being a bad tester and I’m not saying I want to ship bugs – just trying to add perspective.

  7. Jeremy Kelly says:

    Interesting Discussion… I’m in the same boat as Drew. My father is a Civil Engineer and catch hell sometimes for the "improper" use of the word engineer. Some of my thoughts can be found here:



  8. I agree with Drew. As long as computers are not central to life or livelihood, the systems that control them can be treated as "art." The artist says, "this is my statement, deal with it." Once they ARE central to life or livelihood — necessary for our jobs, controlling our automobiles or home appliances, regulating medical equipment, etc. — then they must leave the realm of art and enter the realm of true science and (civil) engineering.

    I think we are at, or very close to, the time when people are not just able, but REQUIRED by the realities of society and economy, to "live" in the infomatic "homes" our industry produces. So it might be worthwhile to recall the Code of Hammurabi:

    "If a builder build a house, and it collapse and kill the owner, the builder shall be put to death."

    How many people are trusting their jobs and their fortunes to Windows, or personal computers in general? Let’s rephrase that: how many MILLIONS of people are doing so? How soon before lives are at stake? What if we had to live under the Code of Hammurabi, as applied to software "structures"?

    Back when I was writing software for a living, I always tried to pretend that I was subject to Hammurabi’s laws. I got as close as I could to meeting that standard, but had Hammurabi’s laws actually been enforced, I wouldn’t be here now! Fortunately, putting the bar that high was considered a game of "overkill" in those days. Perhaps not so much anymore.

  9. vincem says:

    I think that software development definitely belongs in an engineering class. Mainly because engineering is the process of solving difficult or new problems. I think the comparisons here in this post are apples and oranges but both require the same type of problem-solving skills. I mean this is why they made us take all of those silly ‘weed-out’ courses in college, right?

  10. Steve Rowe says:

    Everyone, make sure to catch the parallel discussion going on over at JeremyK’s blog as well (link above).

    I think there are probably 2 issues here:

    1) Is software development an "engineering" discipline?

    2) Can adopting more process from "real engineering" make software engineering better?

    To the first, I say it is largely a semantic argument. There are aspects of software development that are very much like engineering. If engineering is, as Jeremy says, "the application of known technology, materials, and knowledge to solve a problem" then software development is clearly engineering. It is, however, a different sort of engineering than civil engineering. Civil engineering, with so many lives on the line is, by its nature, very conservative. Software engineering can be this way too. I recall back in high school hearing a Boeing engineer explain what they had to do to develop software for airplanes. One thing that stuck out to me is that they don’t ever use dynamic memory allocation. There are no news or deletes in their code. This is software engineering that is more analogous to civil engineering. It’s also software engineering that will develop new capabilities very slowly. Most software development is not so conservative. It doesn’t have to be. Does this make it not an engineering practice? Opinions will differ but I say no.

    To the second point, which is where the meat of my original essay was, I think the answer is not really. The answer to the first question, that software development and civil engineering have vastly different goals, dictates that the answer to the second has to be largely no. Unless software development wants to cut back vastly on what we try to accomplish, we cannot become as fault-tolerant as, say, a bridge. A good example is Trusted Solaris. I don’t have any experience with this product lately but 10 years ago it was much more secure and stable than the untrusted variant, it was also much slower and more feature-deprived. There is a tradeoff.

    The point here is that we must acknolwedge that our craft is materially different than civil engineering and that there is no free lunch in becoming more like civil engineers. It would have a radical effect on what we can accomplish.

  11. I agree with Steve that there is "no free lunch" in becoming more like civil engineers, and also that doing so would have a radical effect on "what you can accomplish." Both are true, as far as they go.

    The questions that such statements dodge, are, howewer, "What is ‘accomplishment’"? And, "To what extent is a software package ‘art and entertainment,’ as opposed to practical, reliable tool?" When you ship software under the status quo, what utility and value have you provided the customer, exactly? If you ship something that has to be rebooted five or six times a day, occasionally eats data (inspiring a paranoid "save often" and "backup nightly" regimen), is subject to frequent re-installation or updating, and is susceptible to virus attacks or other security breaches, you have to count the user’s frustration, not to mention time and data lost, against whatever "improvement" in his situation your software purports to provide.

    The challenge to produce trusted, reliable software on anything faster than a glacial timetable is, probably, not as sexy as the challenge to get the latest bell or whistle to market before the other guy. But it is a REAL engineering challenge. Software developers have to decide how much they want to be entertainers and dabblers, and how much they want to produce packages that "just work," conveniently, efficiently, and reliably. If they decide to go down the latter path, then their task will be to try to put as much well-characterized innovation into each successive product generation as possible, and to reduce the amount of time between generations. That will NOT be done by maintaining that software is essentially "R&D" (and more "R" then "D"). It WILL be done (I believe) by people who understand the different, yet complementary and interdependent natures of "R" and "D," and who structure their operations to optimize the execution of each development stage.

    Anyone who likes the "R&D," high-uncertainty approach to development should probably veer over into games and entertainment wares. I don’t say this mockingly. The point is that, while art and toolmaking (or appliance making) can be blended, the expectations for art are entirely different than the expectations for tools and appliances: still, both are vitally needed in our society, and the practitioners in each area are to be respected for the different value they contribute.

    Just, please, don’t treat an OS as if it were a one-of-a-kind work of art. It IS a building. Architects can make buildings beautiful, and that’s to be encouraged. But at heart, they are to serve a purpose with convenience, safety and reliability.

  12. Steve Rowe says:

    James, I’m afraid you’re reading too much into what I’m saying. I am not trying to justify shoddy code, especially at the operating system level. We must, however, recognize that there are different standards for different levels. The scheduler and the virtual memory system have to be like bridges. In Windows NT/2000/XP, they are. code changes are taken very seriously and a lot of effort is put into quality at that level. Because of this, you’ll rarely see an XP machine blue screen except for a driver bug (which we don’t write).

    That, however, is not really the point of my original essay. What I was attempting to do was to counteract the notion that if we just act more like civil engineers, we’ll finish projects on time, on budget, and with no bugs. Civil engineering projects often run over their deadlines and over their budgets. They do ship with few (vital) bugs. However, they also ship with few features.

    The implications of this are clear. Treat the critical parts of an OS or system like civil engineering and make it stable first and add features later. However, at the periphery, this doesn’t work. History is repleat with examples of solid, featureless products that were defeated by more flashy, but less stable products. Consumers want features and price, not stability. Why else would they buy $30 no-name DVD players instead of something from Toshiba or Sony which probably last twice as long and have half as many bugs?

    We at Microsoft are trying, via the trustworthy computing iniative, to walk the line between features and stability on the periphery. How well are we doing that? This remains to be seen. Windows Server 2003 had a lot of features and was also very solid. We might be doing well. Watch and see.

  13. Dave Froslie says:

    Software Development IS like bridge building when you have a bunch of software developers involved in a contest to make a bridge out of some random parts. See my post, "Bridge Building and Software Development" at http://blogs.msdn.com/dave_froslie/archive/2005/03/07/389113.aspx.

    One point that you can take out of my post is that I took most of the engineering out of bridge building. I didn’t know the strength of my materials. I didn’t try to calculate the forces applied to various components to see where my weak points were on the bridge. I didn’t run a simulation to learn more about my design. I’m not a civil engineer, but I would guess that these would be parts of his approach to a typical design.

    I would agree with much of Steve’s commentary, but yet there are still some similiarities between the disciplines in terms of challenging requirements, the use of design patterns, and project management challenges. As Steve concludes, there is no doubt that we have to continue to get better in our software development endeavors.

  14. I was listening to an interview with Alistair Cockburn tonight on my way home and thought he had some…

  15. Scott Rosenberg just published a new book called Dreaming in Code about a project to create a new personal

  16. Jt Gleason contends that building software is not like building bridges because of the halting problem.

  17. Quicklinks says:

    Attention Mapping: The 10-Point Exercise (tags: webdesign process) A Freelance Programmer’s Manifesto (tags: freelance business software) Why building software isn’t like building bridges (tags: software development)…