Raymond’s reading list: The Mythical Man-Month, The Design of Everyday Things, and Systemantics


The first two of these books are probably on everybody else's reading list, but I'm going to mention them anyway since I consider them required reading for managers and designers.

The Mythical Man-Month is over 30 years old, but the lessons contained therein are as true now as they were back in 1975, such as what is now known as Brooks' law: Adding manpower to a late software product makes it later.

I much preferred the original title for The Design of Everyday Things, namely, The Psychology of Everyday Things, but I'm told that booksellers ended up mistakenly filing the book in the psychology section. Once you've read this book, you will never look at a door the same way again. And you'll understand the inside joke when I say, "I bet it won an award."

The third book is the less well-known Systemantics: How Systems Work and Especially How They Fail. The book was originally published in 1978, then reissued under the slightly less catchy title, Systemantics: The Underground Text of Systems Lore, and re-reissued under the completely soul-sucking title The Systems Bible. I reject all the retitling and continue to refer to the book as Systemantics.

Systemantics is very much like The Mythical Man-Month, but with a lot more attitude. The most important lessons I learned are a reinterpretation of Le Chatelier's Principle for complex systems ("Every complex system resists its proper functioning") and the Fundamental Failure-Mode Theorem ("Every complex system is operating in an error mode").

You've all experienced the Fundamental Failure-Mode Theorem: You're investigating a problem and along the way you find some function that never worked. A cache has a bug that results in cache misses when there should be hits. A request for an object that should be there somehow always fails. And yet the system still worked in spite of these errors. Eventually you trace the problem to a recent change that exposed all of the other bugs. Those bugs were always there, but the system kept on working because there was enough redundancy that one component was able to compensate for the failure of another component. Sometimes this chain of errors and compensation continues for several cycles, until finally the last protective layer fails and the underlying errors are exposed.

That's why I'm skeptical of people who look at some catastrophic failure of a complex system and say, "Wow, the odds of this happening are astronomical. Five different safety systems had to fail simultaneously!" What they don't realize is that one or two of those systems are failing all the time, and it's up to the other three systems to prevent the failure from turning into a disaster. You never see a news story that says "A gas refinery did not explode today because simultaneous failures in the first, second, fourth, and fifth safety systems did not lead to a disaster thanks to a correctly-functioning third system." The role of the failure and the savior may change over time, until eventually all of the systems choose to have a bad day all on the same day, and something goes boom.

Comments (27)
  1. mvadu says:

    Ray You made my day…

    ("Every complex system resists its proper functioning")

    Now I know why every chunk I start to test usually fails… Now I can explain that its not the developers fault.. its proven Maths.. it has to fail as its complex.. :)

  2. mvadu says:

    If you want to have a glance at Systemantics

    http://en.wikipedia.org/wiki/Systemantics

    Good reading.. Now i have to get the actual book..

  3. MadQ says:

    I think I need to make a trip to the library for The Design of Everyday Things. Maybe it will finally give me some insight as to why the automatic paper-towel dispensers in public restrooms are so over-designed. You know the kind: you have to wave your hand over a sensor, and voila, you get a paper towel. Of course merely walking past the dispenser accomplishes the same thing. But never fear, because it also has a sensor to detect whether a paper towel is still hanging from it, so as to avoid paper towel avalanches. If it didn’t have the hand-o-wave sensor, and just spit out a paper towel after the last one was removed, it wouldn’t be quite so buggy.

    And don’t even get me started about the ghosts that randomly turn on the wave-o-hand faucets while you’re relieving yourself.

  4. Functional architecture says:

    The most annoying thing about architecture is that it’s gotten even less functional since Poet (Psychology of Everyday Things).   Universities are especially prone to Frank  Gehry monstrosities.

    And they win awards, too.

    I’ll take a boring old rectangular building anyday, so long as I can actually work in the building

  5. IgorD says:

    And this comment at amazon reveals why we need such books in the first place:

    http://www.amazon.com/review/product/0385267746/ref=cm_cr_dp_hist_1?%5Fencoding=UTF8&filterBy=addOneStar

  6. Illuminator says:

    Reading this here about the "Fundamental Failure-Mode Theorem" makes me laugh on the inside, I marvel at this every day with all the  try-catch’s in the C# code I work with.  It’s amusing how they’ve kept the code running "normal" through the most catastrophic stuff that would have torched and blasted C++ programs.

  7. malduarte says:

    Every newcomer to a complex system believes (I know, I’ve been one) that the system is overly bloated and too complex. Since they’re usually eager to show how good they are, the first lesson is to let them refactor a piece of it and test it afterwards. After they have felt the pain, they are introduced to Systemantics. The scars will never disappear :-)

  8. Jim says:

    The problem sometimes is not from developer or engineer. It’s from the management, either middle or higher. They always believe in process and systems, and they would not quit until they build a screwy one. Even their systems are not working, they can blaim the fault on the ones who build them. The cycle is going again, system always prones to fail as long as it involves with human.

  9. codekaizen says:

    After DOET, you’ll enjoy "Things That Make Us Smart" also by Norman. Human-technology interface is considered in a more fundamental relationship – in that technology is an extension of our minds, and what this can mean for its design. More philosophical than the actionable principles discussed in DOET, but also more primal, seeing as the organization and expression of our cognition is a more basic task.

  10. Qian says:

    "Universities are especially prone to Frank  Gehry monstrosities.

    And they win awards, too.

    I’ll take a boring old rectangular building anyday, so long as I can actually work in the building"

    MIT’s Gehry monstrosity seems to have won a lawsuit instead.  Then there’s also I.M. Pei’s Green Building, which is definitely boring, old, and rectangular.  And it’s got a built-in airflow amplifier that would prevent you from opening or closing the doors to the building on a windy day.

  11. pingpong says:

    Weren’t DOET and Polya’s HTSI mandatory reading for all new hires at Microsoft at one time?

  12. The Embedded Avenger says:

    I have heard Don Norman say that he prefers the original title because it makes a better acronym.

    If I were to add a fourth title to Raymond’s list, it would be Peopleware by Tom DeMarco and Tim Lister. Another must read. Get the 2nd edition with additional chapters.

  13. Can I recommend one other book. "How Buildings Learn", by Stewart Brand. It looks at buildings and how they adapt to changing needs, which can be done either by not having changing needs, adapting to changes, or being knocked down and rebuilt. The longest lasting buildings tend not to be those that have a single function built in, but those which are strongly adaptable in ways you havent thought of.

    Obviously, this is very relevant to software design.

  14. David Walker says:

    The Mythical Man-Month is a great book, although I have read some criticism of it (which is not entirely off the mark).  Still, it’s a great book.  

    The other big point that I remember from the book is, if you constrain the amount of space or time or anything else for one functional area, the developers are forced to "throw it over the wall" and declare that these functional areas are the responsibility of a different group.  

    I think this came up in the design of the various OS/360 control blocks.

  15. DrkMatter says:

    @Illuminator

    "I marvel at this every day with all the  try-catch’s in the C# code I work with.  It’s amusing how they’ve kept the code running "normal" through the most catastrophic stuff that would have torched and blasted C++ programs."

    You know, C++ has try/catch, too! And they keep our programs running through catastrophic failures time and time again.

  16. Gabe says:

    My preferred law from Systemantics is "A complex system that works is invariably found to have evolved from a simple system that works."

    It explains most OSes nowadays.

  17. steveg says:

    MadQ: "Maybe it will finally give me some insight as to why the automatic paper-towel dispensers in public restrooms are so over-designed."

    Reading The Design of Everyday Things won’t tell you why something is badly designed, but it will help you appreciate that many things around you in your everyday life  are badly designed, things you may not have realised.

    It can kind of be summarised by this phrase "Don’t take any <bleep> from a toaster", meaning if something should be making your life easier isn’t… get a new something. When you replace toaster with software design (especially UI) it really can open your eyes to what is dreadful behaviour in software.

    Such as, for instance, let’s say you see an 8 year old kid in tears because they can’t find Print because the File menu of a major office package has changed into a logo. That kid was taking <bleep> from a toaster.

    It is amusing/frustrating/surprising to see how many of his examples are real, with my favourite being doors. For instance in fancy-pants kitchens in these here parts often have no handles, you just open them… but how do you know which side of the door to pull? Why do many push doors have handles — you pull a handle, push a plate.

    I’d recommend it. Some of his other books are interesting, too.

  18. Drak says:

    Systemantics sounds like a book I would enjoy. I recognise many of the ideas (as they are stated on the Wiki page)

  19. Julio says:

    I used to work with fault tolerance techniques at the college. That’s why this is my preferred citation from Systemantics:

    "The Fail-Safe Theorem: When a Fail-Safe system fails, it fails by failing to fail safe."

  20. Mikkin says:

    I keep several copies of Brooks and Norman on hand for sharing around. Now I just have to get a copy of Gall, but he seems to be out of print. I will keep him on the watch list.

  21. Kelvin says:

    While ‘The Systems Bible’ is not available from Amazon at the moment, you can get it from John Gall’s company directly – I have just ordered a copy.

    Go to http://www.generalsystemantics.com/SystemsBible.htm

  22. Entity says:

    "catastrophic failures"

    I would like to know, how you garentee the state of the program in case of a catastrophic failure? This tells me, instead of finding the root cause of the problem your simply turning a easy to find bug into a impossible to find bug.

    A catastrphic failure to me is defined as a NULL pointer, or deferencing a null pointer or memory access vialation. There is no possible way to recover from these states, you have to quit the program and restart. By continuing you assume that the world is a good place and the damage is local when you have no idea how much damage has been done.

    The other problem is, any routine that does anything useful and also takes into all the possible situation where it would be used incorrectly. Is the main reason why so much software turns into complex mess.

    I think you will tell me next that every routine should check if new or delete failed and then take action in case new failed?

  23. Dave says:

    Have you seen the reviews for the Systemantics books on Amazon?  100% five-star reviews.  When did you last see a book that rated like that?

  24. Yuhong Bao says:

    "how you garentee the state of the program in case of a catastrophic failure?"

    That is why catching access violations are not a good idea.

  25. Myria says:

    I just love that type of mistakes.  I set a thread window hook in a DLL at initialization but forgot to remove it at destruction.  But it never crashed until a year later, when in a new patch we started getting sporadic crashes when shutting down the application.

    It turned out that the new patch fixed a bug where the application was leaking a reference to the DLL.  The DLL was then getting unloaded, exposing the underlying bug and crashing when the hook was called.

    In this case, the "safety systems" are actually themselves bugs.

Comments are closed.