Test Me

Our BVTs have grown rather encumbered and encrusted of late, so we recently embarked on a project to trim them down to a more reasonable size. This turned out to be an interesting process, mostly because it quickly became clear that we didn't have a single agreed-upon definition of what exactly these tests' purpose was.

That old saw about asking opinions of three different people and getting four different answers certainly held true in this case. It still amazes me somewhat that a group of people can work together for months or years and still have such disparate understandings of concepts fundamental to the work we do every day. (Aside: This is why feature teams are so important: it's pretty much impossible to write a spec sufficiently precise that all parties involved will have exactly the same understanding of it.) This state of affairs isn't unique to my current team, either. Every team I've been on has defined the various levels of testing differently than every other team.

Despite all this variation, however, the broad strokes are similar even though the details (most especially the names) can be wildly different. Here are the names and definitions my team is using (this week, anyway <g/>):

  • Build Verification Tests, or BVTs (often called smoke tests): A very small set of tests that check whether the app itself is worth even looking at. BVTs for Notepad would be something like "Launch, type a sentence, save, close". To paraphrase one of my my developers: If BVTs fail you don't even want to sync your enlistment because things are so badly hosed just grabbing that version of the source might format your hard drive. The content of these tests is tightly controlled and doesn't change very often.
  • Feature Verification Tests: Per-feature BVTs. A small set of tests that check whether a specific feature is worth even looking at. A program like Microsoft Visio might have separate FVTs for loading/saving, drawing and editing shapes, printing, and so on. FVTs are generally under the control of the feature team.
  • Exit Criteria: A set of tests that check whether important aspects of a particular feature work correctly. These are defined by the feature team and should be recorded in the spec. The name reflects their meaning: they must be passing completely in order to exit the milestone. Exit criteria for a program like Microsoft Paint would likely include tests that each of the tools in the toolbox function correctly in a few of the most common scenarios. These tests are completely under the control of the feature team.
  • Basic Functionality Tests: I personally believe this name is somewhat misleading, because these tests cover more than just what I would consider basic functionality. The 80/20 rule is perhaps the simplest way to describe these tests: eighty percent of your users are going to use about twenty percent of your application's functionality, and twenty percent of that functionality won't be touched by eighty percent of the users. Basic functionality tests cover the cross-section between those two groups of people, with perhaps a few excursions into parts of the other regions that the tester or feature team judges to be especially interesting or likely to be trouble-prone. For an application that can load from and save to previous file formats, BFs would likely include roundtripping to and from files from each previous product version that is still in active use by its customers. Basic functionality tests are the bread-and-butter of a tester's daily work and completely under her control (although a good feature team will want to at least review them).
  • Extended Functionality Tests: All those weird edge conditions and unlikely scenarios that teams rarely have time to get to. Everything else, in other words. <g/> The Notepad bug I mentioned a few posts ago may lie in this category. It's worth taking the time to plan out your extended functionality tests even though you may never get to execute them -- at the very least, it's important to know what testing you aren't doing!

A failure at any point invalidates the results of tests further down the line. If BVTs fail, FVTs aren't even looked at. (On some teams they aren't even run.) If the exit criteria aren't all passing then the results of the basic functionality runs are ignored. And so on. The sum of all these results is used to rate each build; common terms at Microsoft are:

  • Self Toast for builds that fail with BVTs (that is, your computer is toast if you install them)
  • Self Test for builds that pass BVTs and FVTs (and so are usable for testing)
  • Self Host for builds that pass all Exit Criteria tests (and thus work well enough to be demoed and dogfooded)

BVTs and FVTs are critical to daily work as they tell you when you hadn't better sync your enlistment to the latest source if you want to get anything done that day. The rest of the tests are one indicator of your app's quality and correctness. Taken together they help keep your project on the right track.

*** Comments, questions, feedback? Want a fun job on a great team? I need a tester! Send two coding samples and an explanation of why you chose them, and of course your resume, to me at michhu at microsoft dot com. Great coding skills required.