Unit testing and the Scientific Method

Has it ever happen to you when working on a bug fix that after investing time on a fix you realized that the fix does not actually fixes the bug?

Or even worse, new bugs appear as the result of your change in the system under production.

This is similar to what happens when scientists assume a hypothesis to be true without actually running an experiment to confirm it. This something that no good scientist will ever do, however, it happens in the Software Engineering field more often than not.

If you compare the Scientific Method to the process followed to perform a bug fix, you will find a lot of similarities. I will go as far as to say that they are the same process.

Let me elaborate, the Scientific Method defines the following mayor tasks:

1. Define the question

2. Gather information and resources (observe)

3. Form hypothesis

4. Perform experiment and collect data

5. Analyze data

6. Interpret data and draw conclusions that serve as a starting point for new hypothesis

7. Publish results

8. Retest (frequently done by other scientists)

Let see how they apply to Software development and in particular to the process of “bug fixing”.

1: Define the question:

For software development the question is simple: “Is the Software under development done-done?”

By done-done, I mean: Does it complies with user requirements? Does it have bugs? Is it shippable to customers?

Most of the time (if not always), the answer is no. Software will always have bugs and new feature requests from customers.

2. Gather information and resources (observe)

This step is formalized in software development by testing and filling bugs against the software in development. A good bug is one that actually contains the right information and no more. Adding more data will just create noise and make the bug harder to read and understand. This will serve the triage team to decide if team wants to move to step 3.

3. Form hypothesis

This is the step in software development when a developer gets assigned a bug. The next step for the developer is to “guess” what’s going on. I say “guess” because by just reading the bug, most developers will not know for a fact (unless they were the ones that file the bug) what is going on.

The developer may use the debugger to understand and to gather more data about the bug; this could already be done during step 2.

A failing unit test is a great way to formulate a hypothesis to describe the bug under fix. If you find yourself in a position where it is impossible to write a unit test to express the bug under fixing, then you need to step back and invest into building the right infrastructure to be able to formulate hypothesis against your system under development. This is fancy way of saying that you need to build infrastructure that lets you build unit tests.

Another sign of problem in the process is if extensive debugging sessions (lasting more than 10 minutes) are required to formulate a hypothesis about the problem. But this is an entire subject for a different blog post.

Let me emphasize the point about investing in test infrastructure. If you compare the kind of investment made by the scientific community on “experimentation infrastructure”, we in the software development community fall short by a long shot. When did a software project had the need to build a particle collider or to put a huge telescope in orbit just to be able to “test” an hypothesis?  

4-5. Perform experiment and collect data and analyze data

These steps are together because unlike scientist, developers have it easy. They get to always use computers to run unit tests and integrations tests and most tools collect enough data about the system under tests.

I have to note than just like scientists, sometimes developers also need to design and implement experiments. Typical examples of this are:

a.       Design a stress test framework.

b.      Implement your own testing framework from scratch, like XUnit or NUnit if your development system lacks one.

6. Interpret data and draw conclusions that serve as a starting point for new hypothesis

This is the step where important questions need to be asked, like: Is there more cases that our hypothesis is not covering? Why this bug was not discovered earlier? Is this a regression?

Do we have invalid hypothesis (unit tests that actually do not test the system) in our system?

Once the developer confirms that in fact the unit test is valid, then she must proceed to perform the actual fix for the bug, which is equivalent in science as to formulate a new theory or model. In the case of software development we need to find the “the simpler solution that could possibly work but no simpler”.

Scientist and developer, we both love when we find a solution that is: simple, small and complete. And also, both groups of professionals are very suspicious of the solution if it is complex and large. Some scientists will go as far as say that these kinds of solutions are really an intermediate step into the final and elegant solution. I tend to agree with that.

But developers are also engineers and we need to balance cost and time to be able to ship. Shipping is something that does not concern scientist (but there have been some famous races between scientists to see who is able to publish first).

7. Publish results

This is simple; just hit the button to do the submission in your source control system.

8. Retest.

In the scientific community a new theory, model or result is published with the expressed intent to be re-validated by other members of the community. It is enough that a single person in the community finds the result “invalid” to restart the process back to step 2. Software development is the same, once a bug is declared fixed; another member of team (not the same developer) must retest the bug and validate the fix.

In software development, some part of this step is automated if you have a gated check in system.

Developers hate when a bug fix is re-opened, that means the formulated hypothesis was incorrect, and the unit test created (if any) is not testing the right aspect of the system under development.

If the bug is actually fixed then the process goes back to step 1. Which is to ask: Are we done-done?

Conclusion

I focus the comparison with the process of bug fixing, but the comparison applies to the process of developing user stories. And with that almost the entire software development process is covered.

I would to make emphasis on step 3, myself being a software developer; when someone asks you about why a bug happens, do not guess, instead, formulate a hypothesis and run an experiment, then answer the question. That’s what a sane scientist will do. Even worse, do not check in code “hoping” that it will fix a bug. That’s like a scientist publishing a new theory without experimentation. Any scientist that does that not only will be ridiculed by her peers, she can forget about making a career as a scientist.

In short, engineers that do software development for a living should act more like scientist, because like it or not we are doing the same job scientists do.