Buggy Test Code

Article
04/21/2006

All the technical jargon of the world put together cannot compensate for raw stupidity. I recently had this epiphany when I ran into one of those “DUH!” bugs in my test code. Made me feel like a fool and so I thought I would redeem some of my hurt ego by confessing and discussing general test code bugs and why they occur, in the hope that it brings me some closure!

Here’s the nasty little not-so-secret of the testing world that is rarely talked about – the test code is full of bugs. Big, bad, ugly BUGS. The product code is blessed since it will have people beating it to death and using it in different ways thus exposing most (hopefully!) of the flaws contained. But the story of the test code is quite sad – it starts out on any regular afternoon when a tester codes up a few test cases for an API or a particular scenario. The code is run once (depending on time constraints might be run ½ a time only – what that means, we’ll get to in a second) and if everything looks nice and dandy, it is checked in. This is a big moment for the test code, because my guestimate is that for a huge percentage (I would speculate almost 75%) of the code will never be looked at again. I know, I know – that’s not the way things should be. We should have every line of code reviewed (hell – let’s have it reviewed multiple times while we are day dreaming) and periodic re-reviews to ensure the test code is in sync with the API behavior. But then we should also have world peace. But we don’t. So let’s suck it up, get real and move on.

So our story continues with the checked in code that contains a bug as follows.

Bool myBuggyTestCase
{

bool testPassed = false; //Default should be test has failed

//Do a bunch of things for setup
…

//Invoke the API(s) we are interested in
…

//Check the thing worked
testPassed = VerifyThingsAreFine();

//Do the required cleanup
…
//This could be a lot of cleaning – hang in there
…

return true;

}

The problem here is that we do correctly obtain the pass / fail in the line:

returnValue = VerifyThingsAreFine();

however by the end of the test module, we just return a constant value of “true”. In the dry run that we do of this test case, where we step through the code, we find the API does what it supposed to and the returnValue = true as expected. We also step through and verify that the value actually returned by the module and logged in the automation harness is true (of course missing that fact that it is actually the hardcoded “true”). Now we are all setup for trouble. All that needs to happen now is a regression – an API has to stop doing things the way it is actually doing it currently and then all of a sudden we find (or rather not-find) that even though returnValue is correctly calculated as “False” the value returned is still “True”. This test case will happily go on logging this scenario as “Pass” till the end of eternity (or a breaking API change causing an exception or compile time failures – whichever comes first).

We sort-of have started avoiding this problem by throwing exceptions in the test code when a failure case is hit, expecting the automation harness to actually catch and log those as failures. This works much better at avoiding errors such as the one above, but has it’s own gotchas since many times you might be interested in catching exceptions and logging some information about the state you are in. Ideally you should rethrow the exception after you’re done logging, but humans have a tendency to make the same mistake in wonderfully different ways – so we might not.

Another approach that can help ameliorate this situation is the use of intentional programming which would make the code much more lean and mean, thus making the bug a little more easier-to-spot (currently I think the long cleanup parts of the code can cover up this issue as well).

But wait there is more – remember when I mentioned that you might just run some test cases only ½ a time (and you mumbled to yourself “whatever that means!”)? Well what I was referring to was the fact that when you tend to have a lot of similar test cases which are using well factored-out code (so that they all look terribly similar), you will not (and I’m ready to bet on this!) run through each of these cases. In fact you will, like any other tester with his head-screwed on tight, put break-points at key places and run through the cases by ensuring things at those break-points are as expected. But that does mean that a lot of code paths will be running “unanalysed” – with the usual baggage that comes with that word (such as unintended goof-ups).

This phenomenon is exacerbated in Model Based tests, since the test cases are by definition dynamic and so many that you cannot step through each one of them. You step through the action handlers and try and make sure things will work as designed. Plus you tend to get lost in all the jargon such as Action Handlers and Transition Rules. Net effect is that the test code which will be executed in millions of different ways, would have been stepped through for only a handful. Of course, this is exactly what we want for model based tests as well, but it just goes to highlight that the few we walk through and take a hard look at, need to be looked at REALLY HARD!.

DCRs don’t help in the least since you end up patching code on top of existing code creating the classic sphagetti (with no marinara sauce).

So, that brings me to the main agenda (I knew we’d get there someday). Here is a small bullet-list of things I believe can be useful when doing a test plan/code/coverage review (we’ll leave the differences between these for another day).

Make the test case FAIL. I think we don’t do this enough and it is the most important aspect. It is in fact the whole purpose of creating the test code – you hope that someday it fails and catches a regression. So make sure you cause failures and check if the net of your test code can catch those fish or not. Sometimes this is too easy (you just say “of course it will fail if I do that”) and therefore doesn’t get done and sometimes the excuse is that it is too hard (“how the hell do I get that to occur!").
Run through test cases in different buckets. Run through them end-to-end – right from setup, invoking the API(s), clean up, logging etc.
Refactor and modularize code so that we hopefully do the things where goof up is most costly only in a single place.
Look at refactored, modularized code that is called again and again and again using different data sets. The potential damage that bugs in these areas have, is much larger, so spend more time on them.
Spend time on algorithmic and mathematical sections. If you doing something fancy in your test code, make sure it gets looked at more than other parts.
This one would be my favorite but something I don’t see happening in the near future - injecting product code with bugs to see if the test harness can catch those bugs. This would of course entail not checking in the bugs, the “buggy” version of the software is purely a test exercise to check the effectiveness of the test cases being automated. This also goes by the name of Mutation Testing and was recently covered in this MSDN Magazine article.

One more solution that I know some teams do take is to hold the test code to the same bar and same vigorous processes that product code has to go through. I haven’t tried this first-hand but my gut feeling is that it will bog down testing more than the value it will provide. But it would be interesting to hear from someone who has actually tried this and found this to be “the fix”.

What other things would you do to make your test code more reliable? Are these steps even practical to do in the real world? What would be practical?

Buggy Test Code

Additional resources