Test Me


Our BVTs have grown rather encumbered and encrusted of late, so we recently embarked on a project to trim them down to a more reasonable size. This turned out to be an interesting process, mostly because it quickly became clear that we didn’t have a single agreed-upon definition of what exactly these tests’ purpose was.

That old saw about asking opinions of three different people and getting four different answers certainly held true in this case. It still amazes me somewhat that a group of people can work together for months or years and still have such disparate understandings of concepts fundamental to the work we do every day. (Aside: This is why feature teams are so important: it’s pretty much impossible to write a spec sufficiently precise that all parties involved will have exactly the same understanding of it.) This state of affairs isn’t unique to my current team, either. Every team I’ve been on has defined the various levels of testing differently than every other team.

Despite all this variation, however, the broad strokes are similar even though the details (most especially the names) can be wildly different. Here are the names and definitions my team is using (this week, anyway <g/>):

  • Build Verification Tests, or BVTs (often called smoke tests): A very small set of tests that check whether the app itself is worth even looking at. BVTs for Notepad would be something like “Launch, type a sentence, save, close”. To paraphrase one of my my developers: If BVTs fail you don’t even want to sync your enlistment because things are so badly hosed just grabbing that version of the source might format your hard drive. The content of these tests is tightly controlled and doesn’t change very often.
  • Feature Verification Tests: Per-feature BVTs. A small set of tests that check whether a specific feature is worth even looking at. A program like Microsoft Visio might have separate FVTs for loading/saving, drawing and editing shapes, printing, and so on. FVTs are generally under the control of the feature team.
  • Exit Criteria: A set of tests that check whether important aspects of a particular feature work correctly. These are defined by the feature team and should be recorded in the spec. The name reflects their meaning: they must be passing completely in order to exit the milestone. Exit criteria for a program like Microsoft Paint would likely include tests that each of the tools in the toolbox function correctly in a few of the most common scenarios. These tests are completely under the control of the feature team.
  • Basic Functionality Tests: I personally believe this name is somewhat misleading, because these tests cover more than just what I would consider basic functionality. The 80/20 rule is perhaps the simplest way to describe these tests: eighty percent of your users are going to use about twenty percent of your application’s functionality, and twenty percent of that functionality won’t be touched by eighty percent of the users. Basic functionality tests cover the cross-section between those two groups of people, with perhaps a few excursions into parts of the other regions that the tester or feature team judges to be especially interesting or likely to be trouble-prone. For an application that can load from and save to previous file formats, BFs would likely include roundtripping to and from files from each previous product version that is still in active use by its customers. Basic functionality tests are the bread-and-butter of a tester’s daily work and completely under her control (although a good feature team will want to at least review them).
  • Extended Functionality Tests: All those weird edge conditions and unlikely scenarios that teams rarely have time to get to. Everything else, in other words. <g/> The Notepad bug I mentioned a few posts ago may lie in this category. It’s worth taking the time to plan out your extended functionality tests even though you may never get to execute them — at the very least, it’s important to know what testing you aren’t doing!

A failure at any point invalidates the results of tests further down the line. If BVTs fail, FVTs aren’t even looked at. (On some teams they aren’t even run.) If the exit criteria aren’t all passing then the results of the basic functionality runs are ignored. And so on. The sum of all these results is used to rate each build; common terms at Microsoft are:

  • Self Toast for builds that fail with BVTs (that is, your computer is toast if you install them)
  • Self Test for builds that pass BVTs and FVTs (and so are usable for testing)
  • Self Host for builds that pass all Exit Criteria tests (and thus work well enough to be demoed and dogfooded)

BVTs and FVTs are critical to daily work as they tell you when you hadn’t better sync your enlistment to the latest source if you want to get anything done that day. The rest of the tests are one indicator of your app’s quality and correctness. Taken together they help keep your project on the right track.


*** Comments, questions, feedback? Want a fun job on a great team? I need a tester! Send two coding samples and an explanation of why you chose them, and of course your resume, to me at michhu at microsoft dot com. Great coding skills required.

Comments (8)

  1. I liked this sentence:

    (Aside: This is why feature teams are so important: it’s pretty much impossible to write a spec sufficiently precise that all parties involved will have exactly the same understanding of it.)

    Apropos of that, you might like to look at James Bach’s heuristic for ambiguity.

    http://blackbox.cs.fit.edu/blog/james/archives/000178.html

    I actually had some other suggestions for him, and we’re working together on a list. More ways to tell if a sentence is potentially ambiguous:

    – A human, or a machine programmed by a human, was involved in its creation.

    – The sentence attempts to make an assertion about something either existing or anticipated.

    – The sentence will be read or heard by someone or something other than its author.

    – The sentence is more than a few seconds old.

    I’m finding the expression "sync your enlistment" fascinating. Would I be correct if I assumed that it meant, "use the latest build"? How did this term come into existence?

    I liked this sentence, too:

    "The rest of the tests are one indicator of your app’s quality and correctness."

    Most people would contend that each of the tests is one indicator, but it is very useful to note, as you have, that collections of tests can be seen one indicator too. That is, each test AND each collection of tests is heuristic (heuristic: a fallible guideline used for the purpose of solving a problem or learning about something). No test, and no collection of tests, speaks definitively and conclusively to the quality of the app.

    From your reference to coding in the help-wanted section of the post, I infer that all of your tests might automated. If you’re interested,

    http://www.developsense.com/newsletter/2005-01.html#oracles

    Note that automation tends to have a small and narrow selection of oracles, but the tests get performed very quickly. People have a large and diverse set of oracles, but they perform the tests relatively slowly compared to automation, especially if they’re working from scripts. People liberated from scripts and given charters instead will explore and investigate, finding the sorts of bugs that automation can’t due to its limited set of oracles.

    As I see it, automation makes the most sense the closer you are to BVT; as you move away from that, the cognitive power of people will tend to find a broader set of bugs. I think it’s important to recognize that automated testing is essentially confirmatory, exploratory testing is essentially investigative, and that the practice of getting people to work from scripts removes the advantages of both automation and exploration.

    That’s a long background for my question: With reference to the tests you describe above, how are they executed?

    Cheers,

    —Michael B.

  2. The Braidy Tester says:

    "sync your enlistment" is Microsoft-speak for getting the latest source files from version control. We "enlist" to a particular source tree, and when you grab the latest you "sync"hronize your local copy with version control.

    I completely agree that scripted test cases — the most typical type of automated test cases — have little value beyond ensuring that a specific bit of functionality works in a specific way or that a specific bug never reoccurs.

    Most of our test cases today are scripted and automated. We’re putting a lot of thought and effort into making those test cases trivial to write (enough so that even deus and PMs can write them <g/>) so that we can spend our time writing tools to help us test more effectively and find more bugs, and also do all that manual exploratory testing that as you say is so important.

  3. As promised, here’s a small primer on testing. Note: if you’re an inveterate tabbed-browsing researcher like me, hit up my testing linkblog page for hot hyperlink action with all the sites mentioned herein. The best introduction to testing in general,…

  4. j_in says:

    Hi,

    Can i choose Software testing as a long term career?I heard that when a firm or organization is experiencing a situation like no projects to do,we the testers will be the first to be thrown out.is it true?

  5. The Braidy Tester says:

    Certainly testing is a long term career! Some companies may indeed dump testers at the first chance, but there are plenty of companies out there that value us testers!

  6. Things are so badly hosed just grabbing that version of the source might format your hard drive