Keep Your BVTs Clean

At Microsoft we build each of our products on a daily basis.  After each successful build, we run a series of automated tests we tend to call BVTs (Build Verification Tests).  If the BVTs fail, no one further testing is done and developers are called in to fix the issue ASAP.  The idea is simple but trying to actually implement it can reveal unexpected complexities.  One point that is often not considered is what tests to put in the BVT. 

It is sometimes tempting to put all of your automated tests into a BVT.  If they don't take too long to run, why not?  It is important to have only critical tests in your BVT suite.  The reason is that BVTs are supposed to be like the coal-miner's canary.  If they fall over, there is danger ahead.  Drop everything and fix the issue.  If you put all of your automated tests into the BVTs, you'll have lots of non-critical failures.  You'll have something more akin to a Tennessee fainting goat than a canary.  It will fall over often but you'll quickly learn to ignore it.  If you see a failure and say "That's okay, we can continue on anyway," that test shouldn't be in the BVT.  The last thing you want is to become numb to failure.  Put only those tests into your BVT that indicate critical failures in the system.  Everything else should be in a separate test pass run after the BVTs pass.

It is imperative to keep your BVTs clean. By that, I mean that the expected behavior should be for every test to pass.  It is not okay to have a certain number of known failures.  Why?  Because there is no clear indication of a critical failure.  "I can't recall, do we usually have 34 or 35 failures?"  There are two things to consider in keeping the BVTs clean.  First, are the tests stable?  Second, are the features the tests cover complete?  If the answer to either of these is no, they shouldn't be in the BVTs.

When I say tests should be stable, I mean that their outcome is deterministic and that it is always a pass unless something actionable goes wrong.  Instability in tests can come from poorly written tests or poorly implemented features.  If the tests behave in a seemingly nondeterministic manner, they shouldn't be in your BVT.  You'll be constantly investigating false failures.  Fix the tests before enabling them.  If a feature is flaky, you shouldn't be testing it in the BVT.  It is not stable enough for the project to be relying on it.  File bugs and make sure that developers get on the issues.

BVT tests should only cover aspects of features that are complete.  It is tempting to write tests to the spec and then check them all in even before the feature is spec compliant.  This causes a situation where there are known failures.  As above, this is a sure way to overlook the important failures.  Instead, you should only enable tests after they are passing and where you don't expect that behavior to regress.  If the feature is still in a state of constant flux, it shouldn't be in the BVTs.  You'll end up with expected failures.  BVT tests should reflect what *is* working in the system, not what *should be* working.