We have occasional debates here at Microsoft on the value of ‘white-box’ testing.
(For the sake of this discussion, ‘white-box’ testing means testing that takes advantage
of internal product knowledge, while ‘black-box’ testing relies solely on the published
API and behavior.)
On the one hand, if the person who developed code writes the test for it, or if the
tester was intimately involved in the design of the feature, then the test design
is often blinded by the dev design. For example, if the development of the feature
goes to great lengths to ensure that the threading and locking behavior is correct,
then the tester may spend a majority of their test effort there. This is a problem
if, for example, there are major flaws in the way the feature interacts with its config
files. You don’t want a certain focus in product development dictating a similar
focus in test development.
On the other hand, there are some kinds of bugs that can *only*
be found with white-box testing. If code has an off-by-one error that only kicks
in at the exact buffer size that switches between the fast-path algorithm and the
slow-path algorithm, there is every possibility that even well-designed equivalence
classes against the published API won’t find the problem. A quick peek at the
code, however, can tell you exactly what buffer size to use, if you want to give the
product a conniption fit.
However, there is one kind of white-box testing that is pretty much universally seen
as useful – fault injection testing. This is where you inject code at compile-
or run-time to force faults that would not normally occur. These are great!
Once you have a fault-injection test framework set up, a fault scenario that might
ordinarily require an IBM mainframe database and a female goat can be duplicated in
a few couple minutes, with a few simple lines of code. If you are hard-core
about testing error handling in your program, you owe it to yourself to investigate
fault-injection schemes. It is similarly good for increasing code-coverage numbers,
if you use that metric.
Now, there are a lot of different ways you can do fault injection. Recently
I’ve seen some work done injecting faults at the protocol level during SOAP message
transmission. It is interesting to send SOAP messages from one place to another,
and have a little bitty guy who sits in between and drops, alters, or otherwise interferes
with the messages flowing by. (Would that be an example of MaXMLwell’s Demon?)
Going further back in the past, I’ve seen a system that systematically generated every
possible memory allocation fault while running our test suite.
Now, that last example brings up the ‘dark side’ of fault injection testing.
You see, there were a couple problems with that memory allocation testing. Here
is how it worked:
a single test variation from your test suite, and run it. Keep a count of the
number of memory allocations executed.
the variation again. Fail the first memory allocation, and all subsequent ones.
the variation again. Fail the second memory allocation, and all subsequent ones.
doing this until you’ve gone through the entire set of memory allocations that a successful
variation run does.
to the next test variation.
Now, the most obvious problem is that this test takes *forever*
to run. It literally took weeks to run through a simple set of COM tests.
(Oops, did I accidentally reveal the product in question?) You really have to
look at the bugs you find with this method, and ask yourself “is finding these bugs
worth the cost of the testing?” There might be other testing you could be doing
that would find better bugs, or find bugs more efficiently. So if you are doing
‘high-volume’ fault injection, you really need to balance the cost of the testing
against its value to you.
Another problem, really more of a nit, is that while this method seems comprehensive,
there are many cases it doesn’t hit. What if the memory allocations are only
failing intermittently? If something fails only on the pattern “allocation failed,
allocation succeeded, allocation succeeded again, allocation now fails again”, then
the test above won’t find it. I don’t have an answer to this one, except to
give the sad truth that you can always think of more things to test, than you could
possibly actually write (or run) tests for.
Another, much more significant problem, is this: how do you know if it failed?
We took the easy way out, and said “if it AV’s, blue-screens, or otherwise crashes
– it failed.” So if it appears to succeed; or if it returns a wrong error code;
or if the behavior of the program is just wrong, in a non-crashing way – we would
not detect it. Why did we do this? Well, remember our discussion of costs
and tradeoffs. I’ll give you a million lines of test code, developed over ten
years by 40 different people. Are you gonna track through that and fix it so
it reports errors correctly *for every possible
memory allocation failure path*? Note that I’m not claiming that
level of detailed testing isn’t useful – it just wasn’t feasible in that case, on
the scale of the entire test suite. Picking a set of core functionality and
writing some exceptionally robust test code specifically for this type of fault injection
scenario might be a useful endeavor. We’ll see. <grin>
For those concerned about Microsoft code quality, note that these days we do also
have some static analysis tools that will churn through a reasonable subset of possible
call graphs in our programs, and report possible problems in error paths. It
even files bugs automatically – the Windows folks love that, I’m sure!
Some of my readers may be familiar with Microsoft’s stress-testing efforts, where
we often hammer a machine with tests to the point of program failure. While
stress testing is useful, don’t be fooled into thinking that it is an adequate replacement
for fault-injection testing. The biggest problem with stress testing is the
“early exit” problem. If you are crushing a machine to the point that memory
allocations are failing, then most programs are just going to die immediately when
you run them. You are only going to end up testing the first 10% of the program,
which presumably is not the intent. Another issue is that stress test
failures tend to be non-deterministic; you don’t necessarily know when (or if) a particular
failure will occur, and it can be very difficult to determine the actions that led
up to the failure. This makes debugging a stress failure much more, well, stressful;
than debugging an equivalent failure in a fault injection test.
Lastly, note that (as with so many of the topics I write about), there is much, much,
more to it than I have written. For example, there are many different kinds
of faults that you can inject. We talked about memory faults and protocol faults;
but there are also file access faults, security faults (a biggie!), system object
access faults (events, mutexes, etc), registry faults – you can even feel free to
inject faults into your own internal product code – what happens when your lower-level
stuff throws an exception to the higher-level stuff?
It is really not that hard to get started with, however; and I highly recommend it.