How is good software like good science?

I’m not one who believes mainstream large-scale software development really deserves the title of “computer science” (or “software engineering” for that matter).  However, I have been thinking lately that there is an interesting analogy between good software development and good scientific theories.  Here are some examples:


Software program Scientific theory
What is it? A description of some desired computer behavior A postulated description of reality
Who creates it? Programmer Theorist
Who first validates it? Tester Experimentalist
Provability Usually can’t ever be proven 100% correct.  The longer we go without finding any bugs, the more we suspect it may be bug-free. Can never be proven correct, just proven wrong.  The more it resists being proven wrong, the more we believe it is probably 100% correct.
Testability Good software is designed from the ground up to be tested effectively.  If you build a large system first, and then later start thinking about how you might test it, you’re likely to end up with a lot of bugs that are hard to find. For something to qualify as a “scientific theory” it must be “falsifiable” – that is, there needs to be a way in theory you could prove it doesn’t work.  In general, the more ways you might be able to show that a theory is wrong, the better the theory is.  For example, many people argue that most forms of string theory are not sufficiently testable to be seriously considered as science.
Predictive power If you write a program precisely to pass a certain set of tests, you don’t really know if it’s correct until you test it against some new test cases.  

Theories are generally designed to fit known experimental results.  Properly predicting the results of experiments that have already been done is important, but the real test comes when the theory predicts results which aren’t yet known.
Reproducibility No matter how much testing you do, you ultimately need to get some real customers to try your software and see if they confirm your belief of correctness. 

Often the best tests come from users which are most critical of you.  For example, seeing positive comments on slashdot about Silverlight has been a confirmation to me that my team is making some good decisions. 
No matter how much one group’s experiments agree with theory, little weight is attributed to the experiments unless other independent groups are able to reproduce the results.

Often the best tests of a theory come from people who believe it to be false.  The history of quantum mechanics is full of scientists who disbelieved it, but whose arguments and experiments ultimately strengthened it (such as Einstein and the EPR paradox). 
Simplicity Correct software tends to have a simpler, smaller, and/or more elegant implementation.  This doesn’t mean it’s behavior needs to be simple. All things being equal, simpler theories tend to be the correct ones.  This is known as Occam’s razor.  There are many examples in science of theories which are conceptually and mathematically very simple, but whose implications are very complex and non-obvious.  A classic example is an explanation for the complex movements of the planets in the sky.  Ptolemy’s model with the earth at the center was very complex, but Copernicus presented a much simpler model with the sun at the center.
Believability Good software tends to be well structured and documented so that a person can reason about it’s correctness. Good theories tend to have a logical and believable explanation for why they should be correct. 
Specificity Good software tends to have a concrete specification for what is considered correct behavior.  We try to avoid the temptation to build something, write some tests for it, and call the results of those tests “correct”. 

When the tests are first executed, there is ideally only one possible correct output.
Good theories tend to have fewer “free variables”, which are parameters determined by experiment.  For example, each planet in Ptolemy’s model of the solar system had a number of concentric rings with various sizes associated with it.  In the Newtonian model, the movement it determined entirely by each planet’s mass, orbital radii (major and minor), and the universal gravitational constant.

When new experiments are performed, there is ideally just one result that would be consistent with the theory.
Generality The field of software (and often individual large systems) advances by replacing special-purpose components with general-purpose frameworks.  Operating systems and managed runtimes like the CLR are obvious examples here. Good theories often supercede many previous (apparently unrelated) results, encompassing them all under one larger umbrella.  A great example here is the realization that electricity, magnetism, radio waves, and light were all properties of the same electromagnetic force (and eventually even just one aspect of the electroweak force).
Reusability Good software tends to build on previous successes, re-using components that are known to be of high quality, but avoiding dependencies on high-risk pieces.  It’s extremely difficult (and wasteful) to build a completely new large system from scratch.  Personally, I believe this is an area we could do better on the CLR. The progress of science has obviously only been possible by building on previous successes.   


So what can we learn from the study of good science (which has had a lot longer to mature) about how we should approach software?  Here are some ideas:

  • Be evidence-based – try to rely as much as possible on concrete data, it helps avoid the inevitable temptation to deceive yourself.

  • Have a culture of humility – accept that it’s a lot easier to be wrong than it is to be right, and that your work should be assumed to be incorrect until there is enough independent evidence to suggest otherwise.  Recognize that certainty and black-and-white positions are usually overly simplistic and damaging.

  • Extraordinary claims require extraordinary evidence – there is no silver bullet, trying to chase it can leave you running in circles.

  • Be willing to accept a paradigm shift when necessary – it’s sometimes necessary to abandon a long-held and cherished philosophy and accept well-justified radical new ideas in order to keep making progress.

  • Strive for simplicity – adding more total lines of code (like more special cases in your theory) should be considered more of a last resort, than as the normal process of growth.  You can only continue to tack on new ad-hoc solutions to problems for so long before the maintenance costs become stifling.  Removing code is more important for the quality of your software than writing new code.

  • Be self-critical – it’s human nature for a group of like-minded intelligent people to be blinded to the truth by their arrogance.  Recognize this and seek out opportunities to prove yourself wrong.

  • Re-architect when necessary – it’s sometimes advantageous to combine a group of previously independent things into a new component which replaces them all.

  • Study the past – learn from the patterns of past mistakes and successes, and recognize how to predict the most likely avenues for success.  Often negative results (understanding why a theory or piece of software failed to be successful) are more valuable than positive results.

  • Know when to start over – sometimes we have to be willing to let go and give up on an idea or piece of software and start from scratch.  Clinging to the past can be very destructive in the long-run.

I’d love to hear your comments about where this analogy works and where it doesn’t.  I keep thinking about it whenever I read something about “good science”, but I’m not yet sure whether it’s just the natural tendency to make connections between things you know well, or whether there is some deeper underlying principle here connecting these two ideas.

Comments (1)

  1. Frank Hileman says:

    An analogy between software development and institutional science: both have a tendency to become "religious" and inflexible. I recently heard someone describe some current development trends as "cult-like," implying these trends do not live up to your standards of good software development: convention and inflexible beliefs over logical analysis; synthetic complexity in addition to intrinsic complexity.