The purpose of a good test case


Many experts have written articles and devoted chapters of books on the attributes of what constitutes a ‘good’ test case. Unfortunately, most books repeat a common (yet limited) perspective that a good test case has a high probability of finding bugs, and Kaner goes to the extreme by stating “A test that did not reveal a problem was a waste of time.” I pondered this for a while and thought, if I run a test to prove that a feature functions as expected is that really a waste of time? Isn’t it a good thing that testers actually spend some time executing tests that demonstrates the product does what the customer expects it to do? Or do we simply want to restrict the value of testing and reduce our testers to keyboard monkeys who bang away at the keyboard in search of bugs? (I believe limiting the perspective of the purpose of a test in testing literature has only perpetuated the faulty assumption that the purpose of testing is to simply find bugs; a point I previously dismissed). So, what is the purpose of a test?


Jorgensen states in his book, Software Testing: A Craftsman’s Approach that “a test case has two purposes: either to expose an error, or to demonstrate correct execution.” This explanation broadens the purpose of a test case to include both verifying the product meets the requirements, and revealing potential errors or defects. I tend to agree more with Jorgensen in this matter, but also think the purpose of a test case involves even more.


There are several objectives of any effective test whether manually executed or automated. The rationale for test case usually falls into one of seven categories. Each category provides value to the organization in different ways, but they all essentially function to reduce risk and qualify the testing effort. So, a good test provides some measurable value to the organization with the explicit purpose of:



  • Verifying conformance to applicable standards and guidelines and customer requirements
  • Validating expectations and customer needs
  • Increasing control flow coverage
  • Increasing logic flow coverage
  • Increasing data flow coverage
  • Simulating ‘real’ end user scenarios
  • Exposing errors or defects

Comments (6)

  1. Shrini says:

    You might be referring to Cem Kaner’s this article – “what is a good test case’?

    A careful reading of this article and several others by Cem and James Bach would indicate that the notion of “Test case being good if it finds a bug” has changed to “A good test case reveals new information”. Classifying that new information as a bug or mere some information- is a context based decision.

    We all know about “pesticide Paradox” – a test case can be good only initial few times if looked from a “bug finding ability”. As a test case is executed a number of times, it wears out and is less likely to find a bug or expose a new information about application behavior.

    Here are my observations about “Good ness of a test case”

    1. A test case can be good in number of different ways.

    2. A good test case will not (can not)) remain “good” at all the time or during its life time.

    3. A good test case offers easy ways to create variations of its own – hence does not become a victim of pesticide paradox.

    4. Attributes of a Test case that important from overall ‘goodness” standpoint

    a. Information Value

    b. Ease of maintenance

    c. Ease of creating variations of it

    d. Credibility and Power

    e. Provides insight into the feature being tested – serves as knowledge repository

    f. Neither too simple (fit to be disposed after run once or twice) nor too complex ( requires good effort to understand the flow  – hence more time for execution)

    You mentioned that “The rationale for test case usually falls into one of six categories.”

    Which six categories? Where are they mentioned?

    Shrini

  2. I.M.Testy says:

    Hi Shrini,

    Actually, I am referring to a quote in the book "Testing Computer Software 2nd ed."

    As I stated previously, I absolutely concur one of the primary objectives of of testing (and thus a test case) is to provide information.

    The pesticide paradox applies primarily to completely scripted non-regression type tests that don’t allow the person executing the test to deviate from the instructions (or steps). The paradox doesn’t apply to non-functional test cases. For example, a performance test will run over and over many times. Its overall value does not diminish during the product lifecycle (if perf is a critical factor for release). It also doesn’t apply to true regression tests. Regression tests historically have a very low rate of identifying new defects; however are of significant value in providing information and confidence that specific features continue to function as expected. (Build verification tests are critical, provide high value, but the base set changes very little during the lifecycle.)

    You state "A test case can be good in  a number of different ways." I was simply trying to point out those specific ways.

    I don’t agree with your second point. In fact, some tests (specifically those for establishing baseline measures) will and do remain "good" during the product lifecycle, and there are some tests (primarily regression tests) that will remain "good" for several years while the product is in maintence.

    WRT to variations, they should be built into the test case, not additional test cases. For example, I really don’t like hard coded test data. I realize there are times when it is valuable and necessary, but if I designing a test to validate an input control that takes a string of characters you have to believe my approach is going to throw strings of valid randomly generated characters of random length at that control. So, each time the test is executed it uses a different string Unicode characters of varying length. Now I have variability built into a single test.

    BTW…I meant seven categories to match the bulleted list, and corrected that in the post. Thanks for catching that!

    – Bj –

  3. Michael Bolton says:

    >Actually, I am referring to a quote in the book "Testing Computer Software 2nd ed."

    Kaner has rescinded his view that “a test that did not reveal a problem was a waste of time”   since the book was published, and has said so.  See "The Ongoing Revolution in Software Testing", http://www.kaner.com/pdfs/TheOngoingRevolution.pdf  

    That paper also has a few more potential purposes for a test that you might like to add to your list.

    What’s the difference between a single test that has 30,000 variations and 30,000 tests that are very similar to one another?

    The thing that links them is <i>reification error</i>–the notion that a test case, an idea, is a countable thing, rather than a concept.

    Cheers,

    —Michael B.

  4. I.M.Testy says:

    Hi Michael,

    Yes, I realize that Kaner has changed his view (and more recently he also changed his view of tester’s need for programming skills). The use of the quote was not to disparage Cem. My comment is meant to point out the fact that a common misconception regarding testing as simply a bug-finding activity was sometimes perpetuated by leading industry experts and I thought the quote from Testing Computer Software 2nd. Ed. epitomized this limited view of the role of software testing.

    I design a test case based on the intended purpose of proving or disproving consistency with established facts that are known or can be determined). A test case may comprise several ‘tests’ which support the test case.

    For example, if I am testing an edit control’s ability to handle strings composed of valid Unicode characters, then I would iterate through an array, or enumeration, or data file of static strings containing valid characters derived from failure analysis models, then I would auto-generate strings of random length and random valid characters for subsequent iterations of that test case. (NOTE: The explicit purpose here is to validate proper handling of valid Unicode strings > minLength and < maxLength. I am not including boundary testing, invalid characters, or a myriad of other tests that I would also run.)

    If one of the tests fail, then that test case fails. If some of the tests fail, then that test case fails. (If they all fail, the project is probably in big trouble!)

    For example, let’s assume the edit control isn’t Unicode enabled. Some of the ‘tests’ might pass (those with characters limited to the underlying supported ANSI character set), but the vast majority of the ‘tests’ will fail. Let’s say there are 250 unique static strings we are going to use in our test case, and 225 strings contain characters that are outside the range of supported ANSI characters for that test environment. Do I write 225 bugs; one for every failure?

    Absolutely not! That is simply a misrepresentation and artificial inflation of the number of defects. The root cause is that the edit control is not Unicode enabled. We don’t need to tell the developer the control is not Unicode enabled 225 times (because if we do  the developer will justifiably resolve 224 of the bugs as a duplicates! (Assuming the same root cause.)) (BTW..this is why I disagree with Kaner’s assertion that we test to “maximize bug count.”)

    A bug can manifest itself in many ways. A professional tester will troubleshoot the problem to isolate the cause and identify different paths, variations, or symptoms so the defect report provides more detailed information and enable the developer to effect a better fix. Flooding the developer with several bugs all pointing to a single root cause only serves to artificially inflate the defect tracking database (again reinforcing the fallacious idea that testing is valued by the number of bugs discovered), and discredits professional testers who understand their overall value to an organization is not determined simply by the number of bugs they write.

  5. Michael Bolton says:

    >(BTW..this is why I disagree with Kaner’s assertion that we test to "maximize bug count.")

    You seem to miss the point that this is a single example on a list of many possible motivations for testing.  <i>Sometimes</i> we <i>might</i> test with the goal of maximizing the bug count.  Moreover, I don’t think he’s talking at that point about artificially or misleadingly inflating the bug count; I think he’s making a distinction between, given the same amount of time finding a few really super-important bugs, investigating and reporting them in great detail; and finding a whole bunch of bugs (with distinct root causes) irrespective of their relative importance, and investigating and reporting them in somewhat less detail than in the first case.

    Note that in some contexts, management does indeed value the fallacious idea that good testing equals the number of bugs discovered.  There is much education to be done there, for sure.  In my view, counting bugs (or test cases) is a practical guarantee that someone will be misled.

    —Michael B.

  6. I.M.Testy says:

    I realize that it is one example of many ‘motivations’. However, as you indicate below, "…in some contexts, management does indeed value the fallacious idea that good testing equals the number of bugs discovered."

    People sometimes assume the entire message is captured in the first sentence; especially in a lengthy paper. This is why outrageous statements simply fuel the ignorance of the misinformed.

    – Bj –