Should BVTs pass 100%?

This is the gray part of testing. Should the goal be that BVTs pass 100%? On every build, all the time? The right answer is YES. But is that really the right answer for all cases. Well, of course not. I do believe it is the right answer for most cases. BVTs (build verification tests) are tests that are run right after a build completes successfully and is deployed successfully. BVTs should be test cases that cover the most common functionality or scenarios that prove that the build can be tested further. BVTs should help prevent a team from spending time installing a build only to find out that it doesn't work well enough to do further testing.

 

For small projects, BVTs should pass 100%. There's really no reason why this goal can't be attained. Testers and developers on small projects should be communicating enough that feature changes should be reflected in the BVTs. The complexity of these projects, though potentially high, shouldn't be high enough to prevent the goal of a 100% pass rate. As the automation maturity of the team grows (and therefore the automation suite grows), the expectation of 100% BVT pass rate should be even more strictly enforced. If there are always certain tests that fail and cause the BVTs to not run at 100% due to automation code problems, they should be disabled until they can be corrected. Otherwise, these continuously failing tests cause testers to re-investigate them more often than is necessary. A BVT suite is no place for unstable test automation. Also, BVTs should be automated. A manual BVT is almost a contradiction in terms since the goal of BVTs is to streamline the process so adding any manual intervention slows the process down.

 

So at what time is it ok to not pass at 100% (and still allow the team to do further testing on a build)? I know of 3 situations: 1). the feature set is huge, 2). and related to that, the dependency chain is long and 3). the amount of test cases in the BVT is huge. For these situations, let's consider the product you are testing is Windows. What would BVTs for testing Windows builds look like? Even running through a set of common scenarios will still make the BVT suite very large. When I owned running BVTs for the Client Windows Division, almost every day, there would be failures. But should we stop the release of a build to the Printer team just because the Media Player BVT tests failed? No. Of course, the real question is why did the Media Player tests fail? If the goal is 100% passing, then shouldn't it be as simple as fixing or disabling the tests for the next build to get to that goal? Well, yes, until the next build uncovers another set of failing tests. And the trend continues. This is because Media Player is at the top of the technology stack. The UI features are dependent on the underlying SDK, and the SDK features are dependent on the underlying codec, and that tech is dependent on the deep underpinning of the OS, and so on. So a bug (or even a deliberate change) could occur lower in the stack and then the change results in functionality being different in Media Player and the BVT tests fail. And is there really a solution to fix this? The first solution that comes to mind is that dependent teams need to communicate about their breaking changes. And that's a great solution when they know a change will break others dependent on them. But many times, these changes aren't expected to break. And communications on every change can be overwhelming to the other teams. So instead, you accept BVTs not passing at 100% and work to make the investigation of those tests as efficient as possible.

 

Finally let's consider a team like the Graphics team in Windows. They run tests on every configuration of graphics cards and drivers. This becomes a huge matrix. Then add the amount of BVTs that need to run on each of these configurations. We were producing over a million BVT results each day. Even when the BVTs passed 99%, there would be 10,000 test cases to investigate. That was impossible to accomplish within a day before the next build would be dropped and BVTs run, so we would watch the trends and only do investigations of failures when the pass rate dropped significantly. If we didn't do this, we would be stuck in BVT "analysis paralysis". We wouldn't be able to do anything else because we would spend all the time investigating BVT failures.

 

So you see, the simple question on whether BVTs should pass 100% has a lot of complex answers. That's the gray world of test engineering!