It Depends

My previous post stirred up a bit of controversy and spurred a bit of discussion. Fun stuff! The test plan for my new team states:

All  functional, integration, performance, and stress tests are automated and none run manually in test passes. This will allow us to evaluate our product state quickly and spend more time focused on dogfooding, directed exploratory testing, and working with customers.  This will set us up to provide better post-release support.

I think a key phrase is missing: which are worth automating. As in “All tests…which are worth automating are automated.” I hope this is implicit to my new teammates’ thinking. I will be making it explicit. <g/>

Determining which tests to automate is one of the Hard Problems in the testing world. I don’t have any absolute answers or rules to follow. I have however developed some heuristics – fallible methods for solving a problem – which have proved useful. Possibly you will find them useful too. The relevance they have for you will to some degree track how closely your context matches with mine: shrink-wrapped software which is localized into many different languages and supported by sustained engineering for five to ten years.

  • The first pass through an area or feature pretty much has to be manual. I do not know how to write tests for something I do not understand. On the other hand, I don’t need to know anything about a .Net assembly full of APIs in order to write a program which uses reflection to dynamically discover each API, spew bunches of test data at it, and log failures when a certain piece of data results in an unhandled exception. On the third hand, sometimes unhandled exceptions escaping is OK. An application which loads and runs user-created addins, for example, might explicitly choose to not attempt to catch every exception which some random piece of user code could throw.  It depends.
  • Tests which are boring or repetitive or tedious or error prone are likely candidates for automation. If you have two million lines of data to inspect, for example, probably it’s worth writing a mini-tool or a regular expression to do the work. On the other hand, a bout of manual Blink Testing might be just the ticket. On the third hand, maybe just maybe there is a way to make the tests less boring or repetitive or tedious or error prone. It depends.
  • A test which can be easily and reliably automated in a short amount of time is probably worth automating. Unless you won’t ever run it again, or subsequent executions are highly unlikely to tell you anything interesting, in which case it may not be. On the other hand, a test which is difficult to automate reliably is probably not worth automating. Unless you are going to run it on every checkin, or it hits code which has proven to be fragile. On the third hand, tests which seem easy to automate sometimes turn out to be incredibly difficult to automate, and building an abstraction layer or two can simplify things. It depends.
  • Lower-level code is more likely to be usefully automated than higher-level code. Automated unit tests for individual classes and components are almost always useful. Some of these tests may have been generated in exploratory-ish ways. On the other hand, automated tests against the UI are generally more complicated and less reliable. On the third hand, model-based testing can in certain situations help scripted tests become more exploratory. It depends.
  • Automating tests tends to have more of a payoff as the number of environments in which they will run increases. A typical Microsoft application must work on at least three different versions of Windows and will be localized into five or ten different languages. That’s fifteen to thirty separate contexts right there. On the other hand, if ninety percent of your application has no dependencies on the platform and over the last five releases neither you nor your customers have found any platform-related defects, perhaps you don’t need to run the same tests across all those environments. On the third hand, maybe a nasty bug is lurking which you won’t find until to localize into Lower Elbonian. It depends.
  • Code churn can be a useful metric. That is, the more code is being changed a lot, or changed by multiple developers, the more useful automated tests for that are. On the other hand, creating automated tests for a constantly-changing feature might not be worth while if the changes are extremely non-risky or will be cheap to verify. On the third hand, perhaps eighty percent of the code changes are the output of code generators, the input to which is easily verified. It depends.
  • The longer your ship cycle, or the more it costs to deploy a new release, the more likely automated regression tests are to be useful. On the other hand, if you ship a new version every day or every hour, as many web-based services do, regressions can be fixed and deployed immediately and so automated regression tests may be less useful. On the third hand, certain defects might be so horrid that you want to ensure they never ever ever recur. It depends.

Oftentimes several of these guidelines come into play. Say you have a test which is somewhat difficult to automate, would be tedious to execute manually, where you can afford some manual munging of the results in order to find missed failures and bypass false positives. Should you automate it? It depends. <g/>

A few examples from my personal experience:

  • I tested a change to Microsoft Visio’s handling of VBA projects which affected its handling of empty VBA projects (i.e., projects with no user code). This change had to be tested across three languages, five operating systems, and seven file formats – one hundred five different environments, with twenty-some scenarios to check in each. I developed a single parameterized automated test, manually set up each of the environments, and manually kicked off the test in each environment.
  • In the early days of Microsoft Blend its file format was changing on a regular basis, and maintaining the sample projects we used for testing was eating a lot of our time. We decided to automate the creation of those projects, enabling us to push a button to recreate the projects after each change. Then we discovered that maintaining the auto-create scripts was at least as time consuming as directly maintaining the projects was. We dropped the automation.
  • On a previous team we developed a tool which let us easily batch difference large numbers of files, vastly simplifying the process of verifying compatibility with previously released file formats. This was a complex task which was nonetheless well worth the effort.
  • I remember needing to check file version numbers for a few hundred files. None of us on the team knew the Windows APIs well enough to do this programmatically. We decided to split up the files between us and manually inspect each file. An hour later we were done.

Should you automate your tests? Probably. Should you test manually? Almost certainly. How do you choose when to do what? It depends!

*** My new team is hiring! Want a fun job on a great team? I need a tester! Interested? Let’s talk: Michael dot J dot Hunter at microsoft dot com. Great coding skills required. 

Comments (3)

  1. Jim Bullock says:

    It depends even more than that.

    – The first pass is never automated? Except for stuff you can’t get at without automation. Performance, scale and throughput issues come to mind. So does low-level protocol testing, where the protocol is timing sensitive.

    – The "quick / instantaneous complete automated test cycle execution" thing is nonsense. (Not that you suggest this, but it is common in the "automate everything" world.) To start with a "quick" test cycle can’t uncover anything that shows up over time. The counter-argument is, if we build it right things work or don’t with one pass. That’s a problem though. It’s the stuff you didn’t think about correctly that bites you.

    – Similar is combination testing, realizing that every time we slice down the number of tests by declaring an equivalence class, the assertion that "these things act the same" could be wrong. That gets really interesting when we talk about timing, concurrence, and initial state, in addition to the "inputs."

    And more. The "100% automation" approach contains the stealth assertion that "that which can be automated is all we can, or should test." That’s nonsense. More important, that’s a kind of sloppy thinking that no good tester should ever fall for.

  2. Part of my route from the bus to my office goes across a chunk of undeveloped land. This is my favorite