Equivalence Class Partitioning Part 3 – The tests


In the last post we decomposed the set of characters in the ANSI character set into valid and invalid class subsets for use in a base filename component on the Windows Xp operating system. The second part of the testing technique of equivalence class partitioning is to then use this information in tests that will adequately evaluate the functional capabilities of the base filename parameter used by COMDLG32.DLL in Windows Xp.

(Note: It is appropriate here to differentiate between a test case and a test. A test case is the hypothesis or assertion that I am either trying to prove or disprove. A test is the process or procedure I am using to prove or disprove my hypothesis. In other words, a test case in this example might be "Validate a base filename can be saved using valid characters in the ANSI 1252 character set." In order to prove or disprove this hypothesis, I will execute multiple tests using the various subsets outlined in the previous post, and as described below. If any test fails, the test case fails.)

There are basically 2 procedures for defining the tests that use the data from an equivalence class table. Paul Jorgensen describes a procedure for robust equivalence class testing in which each valid class subset is tested individually. Glenford Myers suggests a procedure in which valid class subsets are combined in a test until all valid class subsets have been covered. (Jorgensen refers to this as weak equivalence class testing.) Both authors agree that subsets in the invalid subsets must be tested in a way that only one parameter is invalid and any other parameters use valid variables. Using Myers' approach (or what Jorgensen describes as weak equivalence class testing) the tests for the base filename parameter are illustrated in the table below.


[Edited 11/30] I appearantly made too many assumptions about conducting these tests, so I must clarify some of those assumptions. 




  • Any time a key is pressed there is essentially a state change. The primary state change we are concerned with for this particular functional test is whether or not a file is saved to the file system. However, in cases of invalid input there are other state changes that may be interesting. In most well designed Windows applications an erroneous input in an input control will be highlighted (usually after an error message is displayed). This is indeed an expected state change. The only time no noticable state change occurs (there is still a state change because the machine is processing WM_Key messages) is with Test #5. But, another noticable state change (the one primarily identified in the table below) is the state change to the listview control.


  • There are 3 ways to effectively affect the Save button. One way is to press enter, another is to press the Alt + s key mnemonic, and thirdly to left mouse click the button control (including simulating any of those actions via automation). In some cases the visible state changes may vary; however, the purpose of these tests it so verify the existance of a file name in the file system for valid cases, and the non-existance and no abhorrent side effects in the case of members of the invalid class subsets. So, regardless of the visible state changes, the tester can use any of these procedures to affect the Save button.


  • Application of techniques (systematic procedures) at a granular level is very different than simply trying things to see what happens, exploratory testing, guessing, or wild speculation. I made the assumption that readers are familiar with designing and executing atomic functional tests at a granular level in which certain variables are controlled. For example, once we enter an invalid string in the file name text box and press the save button the Save As... dialog is now dirty; meaning that the next input could produce an error but I will not know if the error is a result of an error with the base filename parameter, or with the Save As... dialog state. Equivalence class partitioning is a low level functional test of a single parameter, so in order to test the base filename parameter the tester (or automated test) should minimally close and reinstantiate the Save As... dialog on each test.









































































































































Tests Data Subset Example Expected Result
1 V1 ETX File saved to disk
2 V2, V3, V4, V5 U V6, V7 yæB1% 9!.ÅęxK File saved to disk
3 V8 CLOCK$ File saved to disk
4 V9, V2, V3, V4, V5 U V6, V7 "myFileName" File saved to disk, but no file association
5 I1 Ctrl + B No error message, no listview state change, no File name textbox state change
6 I2 NUL or nul Error message, reserved device name
7 I3 [tab] Error message, file name invalid
8 I4 lpt3 Error message, reserved device name
9 I5 com7 Error message, file name invalid
10 I6 : OR ht/7g| Error message, file name invalid
11 I7 " OR """" Error message, file name invalid
12 I8 \\\\\ Error message, file name invalid
13 I9 "" No error message, state change
14 I10 . No error message, no listview state change
15 I11 .. No error message, listview state change
16 I12 ...... No error message, listview state change
17 I13 \ No error message, listview state change
18 I14 [space] No error message, listview state change
19 I15 [space]myfile File saved to disk, leading space truncated
20 I16 myfile[space] File saved to disk, trailing space truncated
21 I17 * No error message, listview state change
22 I18 my*file No error message, listview state change
23 I19 myf\ile Error message, invalid path (assumes dir not exist)
24 I20 strlen > 251 Error message, file name invalid
25 I21 myfile and MyFile Error message, file already exists
26 I22 [empty] No error message, listview state change

Reducing tests while increasing breadth of coverage


The total number of possible tests of valid string combinations for just the base filename parameter using only characters within the ANSI 1252 character set is 214251 + 214250 + 214249 + ...2142 + 2141. This number of tests, of course, is a virtual impossibility, so by employing the equivalence class partitioning testing technique we are able to systematically produce a minimum baseline set of tests that has a high probability of both proving or disproving our hypothesis or test purpose, as well as providing great variability in the test data to increase breadth of data coverage. The minimum possible number of valid tests determined by combining at least one element from each valid class subset is only 4 tests. But, let's look at each valid test a little more closely.


Test #3 is probably a red-herring! This is only an invalid filename on Windows NT 4.0 and below. So, if your application is ported from that time frame, and you are using a custom function for your file save functionality rather than Windows APIs then you might consider running this test once. If if passes, you can probably forget running this again ever again on that product. Test #1 evaluates the literal strings in valid subset V1 can be listed in an array or enumeration and one element can be selected at random throughout the development lifecycle, or each literal string can be tested once and the probability of failure in a later build is most likely less than .001%. Test #4 is also a test that probably doesn't require a great deal of retesting of various combinations of elements from subsets V2, V3, V4, V5 , & V7. Elements from the valid class subsets described in Test #2 are most interesting and this is the test that we will probably want to automate and run repeated throughout the development lifecycle because it provides great breadth of coverage. Remember this is the minimum number of valid tests. What isn't covered in this set is common or 'real-world' data sets which we would certainly want to include. Additionally, Test #2 relies on at least one element from each indicated subset. We might also want to consider additional tests that focus on subsets V4 and V5 only. Also, we might also consider testing valid class subset V6 as a special case if we suspected a private function excluded code point values that were not assigned character glyphs. However, if these 4 valid tests pass the probability of failure of any combination of these data sets used in this parameter is minimal. Random selection of elements for Test #2 (and possibly Test #4) may slightly increase the probability of exposing a defect in the base filename parsing routine. Tests #5 through #26 are tests for invalid filenames, or in the case of Test #19 and #20 where the expected result is to truncate the leading or trailing space character.


This of course only analyzes (or tests) the base filename parameter and assumes a nominal valid 3-letter extension, valid filenames do not preexist on the file system, and within the specific context described in the first post. If the context changes, (e.g. this example does not apply to Unix platforms, or Windows Vista, or other parameters) then this set of tests (assuming we would include at least Tests #2, and #5 through #26 as part of our regression suite ran on each new build) would provide a high level of confidence in the functional capability of this specific parameter.


Next, we would also decompose the data for the file extension parameter (which is different than the base filename parameter) because in the File name textbox we can enter either the base filename or the base filename and an extension, and once we verify the functionality of the base filename component, we can then proceed to the next step in analyzing the functionality of the arguments passed to the File name textbox parameter which we shall examine at a later date.


It is important to note the this and any other techniques are simply systematic procedures designed to help us wrap our minds around complex problems. They are not the only approach to testing (only a fool would advocate a single approach to testing), but when used in conjunction with various other techniques, methods, and approaches EQ can help to establish a strong foundation of confidence in low level operations. Of course, as has been previously stated, the limiting factor of this and other functional techniques is the knowledge of the professional tester to think critically and rationally analyze the "overall system" within the appropriate context.

Comments (6)

  1. Shrini says:

    I can think of another way to think about eq classes. In the table you have listed few possible expected results.

    Let us take one, say "No error message and no state change". How about thinking about all those values for basefile name that produce this result. Let us say if this subset is having a million values, from the principle of equivalance classes, system (windows) treats all them them identicailly producing the result – "No error message and no state change".

    In this case, using ECP, I am reducing (asserting that) million values to one test that produces an expected result …

    What do you say?

  2. I.M.Testy says:

    First, let’s base our example in reality and not in wild speculation (because there are not millions of values which would produce an expected result of no error message and no state change (in fact state changes occur constantly, but I will discuss that later).

    So, let’s take the case of characters used in a valid file name. In that case (in reality) there are more than 100,000 valid characters which can be used as a valid filename on Windows Xp, and literally hundreds of trillions of combinations of those characters in a vaild file name.

    The purpose of this verification test case is to evaluate whether or not a file composed of a string of specified characters representing the base filename is saved to the hard disk.

    Based on the equivalence theory that any valid character is handled or procesed just like any others the expected result from using any single valid character, or any combination of any characters is exactly the same; that is, a file should save to the hard disk. So, my oracle will check not only for the existence of a new file, but will also check for consistency in the file name. (I won’t check file contents in this case because that is not within the scope of this specific functional test.) However, because of our understanding of character encoding and complex string parsing issues when thunking between ANSI and Unicode, and parsing Unicode surrogate pairs we must have multiple subsets of even valid characters…right? That is why I limited the range of characters in this example.

    So, in this specific context where I limited our set of characters, I am asserting that any valid character or any combination of valid characters in Valid Class subsets V4 union V5 will produce the same expected result (a file is saved to disk of specified file name). There indeed may be some probability that a combination of characters may cause some errant behavior; which is why I stated that would not run only one test and call it good. But, since I can’t test everything, but I need to be smart about what I do test and by choosing various elements from those sets each time the test is ran I increase my coverage (and the probability of exposing errant behavior).

    If you can conclusively demonstrate where any of the assertions are incorrect or faulty within this specific context, (in other words, if you can expose a problem or a code path that is not covered within the context of testing the base filename component) then please let me and the other readers know, because I love to learn new things.

    Now, regarding state changes. I made some assumptions in my post.

    • First, I assumed individuals would be knowledgable about how most well designed Windows applications will highlight erroneous input in a textbox control. This, in fact, is a state change that I would codify an oracle to look for.
    • Secondly, I made an assumption that readers would understand that each test at this granular level of functional testing is an atomic test that requires closing and reinstantiating the Save As dialog, and simply not keeping the dialog alive on the desktop and pounding in erroneous input values. (Those who understand Windows system internals will understand why.) Keeping the dialog instantiated and pounding away with erroneous value may reveal unexpected results, but that is a different test case.
    • Thirdly, I assumed individuals would understand there are different state changes with the textbox control and the Save As… dialog depending on whether the tester or automated test issues an Alt + s key mnemonic, an [enter] key, or whether they actually click (or simulate clicking) the Save button control.
    • Finally, I made a faulty  assumption of not specifying the state change I was referring to was in the listview control.

    Now, if you were really on your toes you would have discovered that, in fact, I swapped the expected results for the space character and the period character. So, I have clarified these assumptions in the post above and corrected the table of expected results.

  3. Shrini says:

    Just few words about "reality" and "wild speculation" —

    In my opinion all of software testing starts with some speculation or hypothesis and you can never be sure of "reality" in software. No one has seen "reality" of software – what you see as reality (eg: character "A" appearing on screen in a notepad file) is a manifestation of one or more interacting processes. We can only attempt to interpret possible manifestations and link them one or more possible functional components.

    At times software testing is more like "new drug trials" by pharmacy companies. New drugs are administered to a group subjects and effects are studied for possible outcomes. So there is a speculation that a new drug will cure a specific disease and there are no side effects.

    Now come to software testing while testing you attempt few values against a software feature, observe the behavior and verify that results are in agreement with specifications – pretty much like drug trial. You may say that compared to drug trial practice, software testing is more evolved field and has many *well* established techniques. You may say that it is easier to predict software behavior than human body behavior to a new drug.

    I think that in terms of complexity, human body and software are some what comparable.

    A drug meant for heart disease can have long term impact on vision or hearing related functions in a human body.  A local hard disk search on a computer might draining resources on a server and sending confidential information to a hacker sitting remotely.

    We never know what will happen if we do "X" with the software and can never be sure about all possible outcomes. That is why we test.

    Shrini

  4. I.M.Testy says:

    Please do not take my words out of context. (Since you proclaim to be of the “context-driven” school, I am certain that you understand that context of spoken or written words is often dependent upon the words preceeding or following specific words that influence its meaning.

    Context (def. – the parts of a written or spoken statement that precede or follow a specific word or passage, usually influencing its meaning or effect) is important. It is often fool-hearty to simply take one word out of a sentence (especially if you remove its adjective)and argue it’s merit (unless its dennotative use is incorrect). There is a world of difference between the denotation of the word speculation and the connotation of the adjectival phrase wild speculation.

    I think I have already established with readers that designing a test is based on some rational hypothesis.

    So, in the context of my sentence in the previous reply, the adjectival phrase wild speculation infers random guessing or irrational thought. In this specific context (facts or circumstances for a particular event or situation) there are not one million values that would produce a result of no error message and no state change), so your example is derived from either irrational thoughts, or perhaps an incomplete understanding of the context for this situation. (I certainly hope the latter.)

    As I also stated previously, any glyph used to represent a spoken language is an abstraction of that language. The materialized form (manifestation) of the letter A as it appears in Notepad does require several interacting processes, but it is certainly not magic.

    From a high level perspective the user presses the a key on a keyboard, the keyboard generates a scan code, that scan code is sent to the computer, which is then sent to a keyboard driver which converts it to a keycode, the keycode is processed as a windows message, and finally displayed in the edit control of Notepad. I can track the signal anywhere along the route from the moment of the key press to its final manifestation of the character A (or its material form). Of course, there along the way the code may be processed through a character mapping table, and if the user is inputting in an East Asian language it may also go through an input method editor (IME). I can even grab the letter A out of Notepad’s edit control and analyze its binary structure to make sure it is the glyph that is assigned to the character code point for the Latin character A, and not some other sequence of bits pretending to be a letter A.

    Again your analogy with drugs is too simplified. Before a drug is even considered for human testing, pharmacsists and scientists who possess in-depth knowledge of chemicals and chemical interactions form a hypothesis around a proposed drug. Then that drug goes through in-depth computer simulations followed by years of controlled analysis on mice or other animals. Finally, some drugs might make it to human testing, in which the subjects are closely monitored. Yes, the drug manufacturers start with a hypothesis. Sometimes it bears out, sometimes it fails, and sometiems there are unexpected outcomes. (For example, Viagra was a drug originally developed to deal with blood pressure issues. However, in the end it produced an unexpected outcome.)

    The difference here is that pharmacists who envisions new drugs, and doctors who diagnose maladies in a patient have in-depth knowledge of chemicals and the human body (their trade) and are constantly educating themselves about advances in their specific trade, and they rarely simply quess randomly, or propose some irrational hypothesis or diagnosis.

    Context (the set of circumstances or facts that surround a particular event, situation). In your example above “We never know what will happen if we do “X” with the software…” is true. Of course we won’t. Because there is NO CONTEXT!

  5. Shrini says:

    >>> In your example above "We never know what will happen if we do "X" with the software…" is true. Of course we won’t. Because there is NO CONTEXT!

    Here is one possible rephrase : "We never know what will happen if we do "X" with the software…" is true. Of course we won’t. Because there are so many contexts possible, hence the answer would depend upon who is asking and in what context. Context driven software testers are aware of this and constantly practice probing around context. Welcome to context driven testing. !!!

    Shrini

  6. I.M.Testy says:

    Shrini, I know you have been taught to try to rephrase other peoples words; however, I usually choose my words very carefully with clear intent and purpose. They have definitive and specific meaning. There is no need to rephrase them. 

    Thus, when I state “there is no context” in your given example, I mean exactly that you have provided a vague and arbitrary example without any context (a set of circumstances or facts that surround a particular event or situation) whatsoever.

    Since you made the statement “We never know what will happen if we do “X” with the software…”, I am asking for you to define the context of “X” and the software.

    One more time since you’ve obviously missed the definition of context being the set of circumstances or facts that surround a particular event or situation. So, when you put something in context it actually means to define the set of circumstances, that are possible.

    Your explaination of “constantly practice probing around context” actually inplies not putting things into context (the denotative meaning), but actually infers guessing about random possibilities in the vicinity of, or near the  set of circumstances (context) for a particular event or situation.

    So, if you can come back to reality and provide a context for your example then we can have a meaningful conversation. If you can’t, then this conversation is pointless psycho-babble.

    Also, I really hate to burst your bubble, but software testing has been contextually based well before the neologism “context-driven.”

Skip to main content