Some Thoughts on Test Frameworks

It might not be the sexiest of topics, but building, consuming, extending and debugging Test Frameworks is a large part of any SDET's day. Traditional wisdom teaches us to write abstraction layers - often many of them - to shield test cases from changes in the system under test: in API testing this often means creating scenario-focussed wrappers over the product; when testing UI this might mean adding layers of automation on top of the actual controls a user would interact with. We generally refer to these abstractions as Test Frameworks - they allow us to write test cases at a much higher level and reduce the maintenance burden of fixing test cases as a system under development inevitably changes.

Recently when struggling through a set of changes to one such framework, it struck me that we pour a lot of time and effort into these frameworks: would life be simpler, tests better and productivity higher without our obviously needed framework-friend? Maybe.

Firstly- we're building these frameworks to hide changes in the system under test. Aren't changes in the product exactly one of the things we'd like to be detecting? In simple tests and certainly tests of APIs, small changes which break tests might be pretty relevant for a feature tester to look at, but in higher-level, end-to-end scenarios, we're probably better off having these changes hidden. So, if our end-to-end, high level scenarios benefit from frameworks, what level of investment do we want to make in them? It probably depends on what portion of the test plan is comprised of end-to-end tests.

Another use of frameworks is to hide complexity in the product itself - consider an API where we need to instantiate an object, make a method call to initialize it, then execute an operation on it in order to achieve some customer scenario:

 var iceCreamMachine = iceCreamTruck.CreateIceCreamMachine(Flavors.Chocolate);
iceCreamMachine.TurnOn();
var treat = iceCreamMachine.MakeIceCream();

Since spreading code like this all over our test cases is going to get messy, we add this Helper Method to the test framework:

 T MakeTreat(IceCreamTruck truck, Flavors flavor) where T : Treat...

It's great, it works and it makes our test code look a lot nicer. There's certainly no reason not to do this, but ask yourself this: If I want this helper method to make writing my code easier and cleaner, wouldn't my customers want it too?

We can add these abstractions and wrappers because we ignore or assume details: the system under test often allows a great deal of flexibility, often more than is required for the test cases in question. As the suite grows, more of the products 'knobs and levers' are altered by tests, meaning that more options creep into our wrappers too: sometimes this is a 'helper' class with more properties than the class it's helping you to use; sometime's its a 'helper' method with 15 oddly named parameters. Often - despite the best intentions of the framework's original designers - it becomes more complicated than the product it tests and is always less well documented.

I came to a basic conclusion: Test Frameworks are fun to write, but that doesn't always mean it's the right thing to do!

When dealing with the sordid topic of abstractions, it's worthwhile to remember a 1987 essay by Frederick P. Brooks Jr., No Silver Bullet, in which we're all reminded that "The complexity of software is an essential property, not an accidental one. Hence, descriptions of a software entity that abstract away its complexity often abstract away its essence." - translating this roughly into test parlance: Avoid abstracting away the thing you're trying to test.

In general development, any learned software engineer will tell you that readability of code is really important: for every time a line of code is written, it is read many, many times. Optimize for the time spent reading, not the time spent writing. In test, I'd like to propose something similar: for every time a test case is written, it will be read and maintained several times, but executed and investigated an order of magnitude more. Maybe we should start optimizing our tests for the times that they fail. After all, a test case is surely at its most useful when failing, since it is hinting at a defect in the product being tested. As soon as a test fails, I usually run through three steps of analysis:

  1. What does this test actually do?
  2. Is the framework working property? Is this a framework issue?
  3. What do the repro steps for this bug look like?

Often, the exception actually thrown by the test case is fairly general - it's usually coming from somewhere deep in the test framework and doesn't give much clue to what's actually happened or what the expected outcome of the failing operation was supposed to be. Consider these two exception messages, this from a test framework:

 The expected 'TastesIckyException' was not thrown.

... and this one from a test case:

 Calling MakeIceCream with custom composite flavor 'Mango' and 'Bacon' should fail, 
because mixing sweet and savoury ice creams together is not allowed.

There's really not much difference between those two messages, but the second tells you a little more: the test was trying to use custom flavors; the test was using 'composite' flavors; the test was not expecting Mango and Bacon to be a valid flavor. You instantly have some idea about what the test is supposed to be doing and where the failure(s) might be.

In the first case, above, you'd need to dig into the framework to figure out what's going on - is the wrong overload of MakeIceCream() being called? Is the IceCreamMachine correctly configured with Mango and Bacon? Is it the right type of machine? Consider a second test case that looks like this:

 var flavors = new CompositeFlavor(Flavors.Mango, Flavors.Bacon);
var iceCreamMachine = iceCreamTruck.CreateIceCreamMachine(flavors);
iceCreamMachine.TurnOn();
try
{
    var treat = iceCreamMachine.MakeIceCream();
    Fail("Calling MakeIceCream with the custom composite...")
}
catch (TastesIckyException e)
{
    Log.Expected(e);
}

It's pretty obvious what's going on here, and has a nice side-effect. The test method describes the repro steps. A dev can read these and see:

  • We're using a CompositeFlavor
  • We're passing it into CreateIceCreamMachine
  • We call an overload of MakeIceCream() with no parameters
  • We expected a TastesIckyException
  • The problem is that sweet and savory can't be used together in an ice cream flavor

When your test case looks like your repro steps would, a developer can easily determine what the problem is, where it might be and what the expected outcome is; You can easily figure out what the expected outcome of the test is and why it's failing; Both of you save time.

Obviously, this is a fairly trivial - and specific - example. However, it's lead me to think about three things when writing new tests:

  1. Write tests which look like the repro steps you'd want to hand off to a developer;
  2. Explicitly encode the Expected and Actual outcomes of your test directly into the test case;
  3. File bugs about the 'Helper Methods' you need to add to make using the product more pleasant - your customers will thank you!

And lastly: it goes without saying that re-use isn't a bad thing, neither are common libraries or frameworks. We'll always benefit from common code, abstraction and toolkits - just consider how, where and why you're using them.