Copy + Pasting in Unit Tests

Some people love designing. Others love writing the code. While it’s true that some like testing and breaking software, I haven’t met a lot of people who enjoy writing unit tests. There’s something about them – getting 100% code coverage even though you know you’ll never encounter some situations, the repetitiveness of the set-up or the hassles you have to go through when dealing with external dependencies (such as files) – that make developers in particularly want to get done with their unit tests as soon as possible.

The result of that is, among other things, that we get sloppy and start taking shortcuts; one of them involves copy + pasting code from the code you’re actually testing.

For example, let’s say you have a program, which based on some input, will perform some action. It can be as simple as:

    1:  public int ConvertStatus(string status)
    2:  {
    3:          switch(status)
    4:          {
    5:                  case "status - valid":
    6:                        return 123;
    7:                  case "status – invalid":
    8:                        return 456;
    9:           }
   10:   
   11:           return 0;
   12:  }

Unit testing this method is relatively straightforward: pass in the first status and make sure the return code is “123”, pass in the other status and make sure the return code is “456”, pass in a different value, and make sure the return code is “0” (my college courses also tell me to pass in null and empty string and make sure the return code is “0”). What some developers will end up doing is look at the code, copy the statuses and check the return value. All tests pass, the little green check mark is there, and we’re good to go.

So what’s wrong with that? Does the unit test actually test the code? Yes. Do the tests cover all the possible cases? Yes. When you actually run this code in the real world, is it going to perform as designed? Maybe. Let’s look at why.

In a larger organization (where larger is defined as not you and your buddy), there’s usually a delegation of duties, and different individuals will provide input in the process. At Microsoft, usually all engineering work is split between three roles: Program Manager, Developer and Tester. It is the PM who defines the requirements, the dev who designs, codes and unit tests the software, and the tester who will ensure the software developed meets the requirements defined. In our example below, the statuses can be defined either by the PM or the dev, and will be in a document (Functional Specification or Design Document). This ensures that everyone (including people outside the triad mentioned above) can easily read the document and understand what’s being delivered, rather than having to read code. So let’s assume that the PM wrote the statues, and when the dev went to code it, copied the text into their code. Then they wrote the unit test in the manner described above, and decided they’re code complete. However, when the tester runs their automated tests, it fails miserably. The dev’s first reaction is “no way, your test must be invalid; I wrote unit tests, and they passed fine”. How can this be?

As I discovered recently while doing a code review, word processors have a nifty way of changing your characters. For example, if you type “word1”, space, “-“, space, “word2”, the “-“ will actually become “–”. In a document, that actually does look better, but in actual code, nobody could care less.

So what happened above is exactly that: in the second status, instead of having a regular “-“, we ended up with a “–” (copy this text in a hex editor, or change the encoding to UTF-8 and you’ll notice). In order to make sure we don’t have any typos, the developer copied and pasted the text from the Functional Specification into their code. Afterwards, that same text was copied into the unit tests, thus ensuring everything passes. The tester however typed it by hand, and now we broke.

So what? If the system has that status everywhere, what’s the problem? Well, the other developer who wrote the component which stores the status in the database didn’t actually copy + paste the values, for one. Or this method can be part of a public API, and you have no idea who’s calling it. Chances are, 99% of your users won’t put the text in a document first and then copy + paste the value, but rather type it by hand.

So after this long story, the morale is pretty clear – don’t copy + paste things from a text document, just write it by hand. Actually, there can be problems with that too – and the most obivous is a typo. You copy + paste the typo in your unit tests, things pass again, yet you’ve got the same problem. This is how you end up with some APIs that have typos in them, and cannot be fixed due to backwards compatibility.

Whether you copy the text from a document, or you type it by hand in your code, you should always write your unit tests from scratch. Anytime you use Ctrl-C, Ctrl-V in a unit tests you risk masking a bug. Other than increasing your time, actually typing your unit tests can lead to one of three things:

  1. You do a good job, and the tests that pass are actually proper
  2. You write the proper test, and discover a bug in the code.
  3. You make a mistake in the test, and it will fail. You will take a look and fix the mistake (either in the code or the test)

Finding these types of bugs is usually a pain. Even when you have an error, it’s hard to spot because our brain masks this and interprets the correct thing. Unit tests are the best, and in the end fastest way to find these issues early and isolate the problem easily.