Equivalence class partitioning


I have been teaching formal testing techniques for several years at Microsoft and University of Washington Extension. Techniques are systematic procedures to help solve a complex problem. A technique does not find all types problems; techniques are generally very good at finding very specific classes of defects. But, the usefulness or effectiveness of any particular technique relies on the in-depth system and domain knowledge of the tester, and in the tester's skill and ability to apply the correct technique to a problem space.


One of those techniques that I teach is equivalence class partitioning (ECP). On the surface ECP appears rather simple, but the effectiveness in the application of this technique relies on the tester's ability to decompose input and output variables into discrete subsets of valid and invalid classes.


Over-generalization of the data reduces the number of subsets in a specific class and increases the likelihood of under-testing. Hyper-decomposition of the variables may increase the number of subsets in a specific class without providing additional benefit (although it may potentially increase the number of tests). Based on this observation I have proposed the following hypothesis or Bj's Theory of Equivalence Class Partitioning Data Decomposition: Over-generalization of data reduces the baseline tests and increases the probability of missing errors or generating false positives. Hyper-analysis of data increases the probability of redundancy and reduces the overall effectiveness of each test.


ECP is not some rote activity, but in order for it to be most effective the tester requires a lot of in-depth knowledge about the overall system (hardware, operating system, environment, etc), the domain (client program, programming language used to develop the application, API interfaces, etc.), and the data sets. Limited knowledge or inability to adequately analyze the data based on the system and domain spaces in one or more of these areas is an impediment to the effective application of this technique. I suspect those who fail to perform an in-depth analysis of the system, domain, and data interaction leads some people disregard this technique, or become hyper-critical of its usefulness.


In this months issue of Software Test & Performance magazine I discuss the theory and application of the equivalence class partitioning technique as a viable and extremely useful tool used by professional testers. The magazine is free to download, and the article goes into great depth explaining when and how to apply the technique of equivalence class partitioning. Let me know if you have any questions, and I would appreciate your comments or feedback.

Comments (10)

  1. Shrini says:

    You might find some arguments and discussion about ECP at my blog ….

    As you can see lots of questions and less answers … Would appereciate if you can I pick up some unanswered issues in my blog so that this post of yours can have some involved discussion … (or I will come back after reading your STP article with my questions)

    http://shrinik.blogspot.com/2007/06/testers-quiz-what-is-equivalence.html

    Alan page dedicated one post on this topic on my request (when I met him at STAR EAST)

    http://blogs.msdn.com/alanpa/archive/2007/06/15/equivalence-class-partitioning.aspx

    this is an interesting discussion …

    Thanks for starting this topic

    Shrini

  2. I.M.Testy says:

    Hi Shrini,

    Yes, I read your blog, but did not respond because this article was pending publication. On the surface ECP seems rather simple, but in fact its viability depends on the tester’s knowledge and ability to analyze data. Please read the article, and I would be more than happy to answer any specific questions. It is indeed an interesting topic of discussion.

    – Bj –

  3. Shrini says:

    Some thoughts about ECP here

    http://shrinik.blogspot.com/2007/10/types-of-equivalence-equivalence-class.html

    I read your article … one thing that the article did not seem to address is types of equivalence. Your example – Next date uses what I call as “universal equivalance”.

    Coming to other example of Name edit control – you mentoined that an ASSERTION of same result for both “ABCDEF” or “ZXYVUTSRQPONM”. What is the basis of such assertion? which result we are referring to? Will the assertion hold good if the Name controls is plain HTML edit box or a VB control of windows application or command line parameter for a API function? Key items to consider here are “Type of Equivalance”, “Basis of equivalance” and “Type of result”

    Do you think it will be useful to classify and dwell deeply on types of equivalances?

    Another question I have about ECP – is this a black box technique or white box technique?

    Traditional literature (including Lee Copeland’s book on test design) call this as a black box technique.

    You mentioned in the article about knowing data types, programming languages and operating system etc .. Are we not entering it white box stuff?

    Shrini

     ——————————————————————

    Bj’s response – 

    Equivalence class partitioning is a functional testing technique that involves in-depth analysis of variable data applied to a parameter within the context of a specific domain.

    I don’t know what you mean my ‘universal equivalence,’ but I suspect it is similar to the neologistic ‘types’ suggested by Kaner. I don’t think there are ‘types’ of equivalence, and I think it only provides a feel good explanation of this difficult technique to individuals whose only contribution to testing is finding bugs from the user interface. So, I don’t think it is useful to classify or dwell deeply on mythical ‘types’ of equivalence. But, if that helps someone understand the technique better then good for them. Personally, I think it just muddies the water of rational, logical thought.

    The basis of my assertion (declaration) is found in the stated example. The example clearly stated that an edit control accepts a string of upper case Latin characters between A and Z with a max length of 25 characters. Also, since I am familiar with ASCII, ANSI, and Unicode encodings, standard string parsing algorithms, and how Windows, Unix, and Mac environments process characters within this range I can logically assert (declare) with a high degree of confidence and probability that any combination of these 26 characters of any length string will produce the same expected result (what ever that result may be, since this was a simple leading example).

    Will this assertion hold true if the control is an HTML edit box or a VB control on a Windows form? If that standard HTML edit box or VB control is also limited to a string length to 25 characters the answer is yes. This is because upper case A – Z is such a limited set of characters and the code points for the stated set is identical regardless of whether the encoding is ASCII, ANSI, OEM, or Unicode, or any transformation format of Unicode, and regardless of the operating environment or system, and assuming the developer is competent and able to parse standard textual strings. (Of course, this is where in-depth domain and system knowledge are advantageous and why ‘subjective equivalence’ is merely guessing.)

    However, if the range of characters was expanded to include the entire Unicode repertoire then the answer would be no, and the professional tester knows that he/she must analyze the data for each parameter based on his/her in-depth understanding of the system. Unfortunately there is not a one size fits all, and as I stated, the less the tester understands the whole system, the less effective his or her ability to use this technique.

    As I stated above, I view ECP as a functional testing technique because it is designed to evaluate the functional attributes of a parameter. A professional tester can design an effective functional test from either a ‘black box’ or ‘white box’ test design approach.

    The more knowledge a tester has of the system the more effective the tester is in commission of his/her job. It is the in-depth knowledge of programming concepts, operating systems, data types and encoding, hardware platforms, etc that separate professional testers from those who simply view the role of testing as simple bug finding.

    The Next Date program used in the article is used as a simple example. In practice the effective application of the technique requires a great deal of skill, and both in-depth and broad knowledge.

    Over the next few weeks I will discuss the application of ECP applied to COMDLG32.DLL, and perhaps that will provide a greater understanding of this technique.

    – Bj –

  4. Shrini says:

    Here is another part in your article that seem to oversell or glorify ECP technique. I do understand the importance of in-depth analysis of the system, domain, and data interaction – but of the opinion that these are generic testers skills and do not belong to ECP as a technique.

    >>>> First, it systematically

    reduces the number of tests

    from all the possible data inputs

    and/or outputs, and it provides a high

    degree of confidence that any other

    data in one particular subset will

    repeat the same result.

    ECP does not “reduce” ANYTHING by itself systematically or otherwise. It is tester’s hypothesis, judgement and assertion that certain sets of data values CAN be treated equivalently. ECP’s principle per se is that there are groups of data that can be modeled to be treated identically by the AUT.

    Actual/real hard work of theory, modeling, hypothesising about data types, establishing equivalance and creating the “big picture” is the forte of a skilled tester. ECP is just a wrapper or entry point to such deep intellectual process. IMHO, touting ECP as systematic technique to REDUCE the number of tests (actually data values fed to a SINGLE field) is an Oversell.

    I hope you agree with my assertion that “under the hood” of ECP is core tester’s skill of modeling the big picture. One way deploy ECP efficiently would to “downplay” ECP as a technique instead focus on testers skill to identify data sets,diff types and nature of equivalances. If one were to just start with data domains and checking equivalances etc without even mentionging the name of ECP, the effect would still be the same.

    Shrini

     ———————————————————-

    Bj’s response –

    I am sad that you assume my article seems to glorify or oversell equivalence class partitioning. My goal was the opposite. I had hoped that readers would understand that this functional technique was not simple, or an easy application of some rote process.

    As I read through this comment I almost hesitated to publish it because although I have never met you, I think you are much brighter than to suggest such simple-minded rehtoric. Certainly a technique does not do anything! Absolutely a tester’s skill is imperative!

    DUH!

    Nor can a claw hammer pound in a nail by itself.

    I assume that most readers of the magazine (and this blog) can think rationally and logically for themselves, and they understand that inanimate objects (such as tools) do not / cannot affect some magical action on its own.

    So, now that we have that out of the way…let’s try to focus on the topic of software testing, and submit our philosophical debates on the right color of the user interface, and other trite throughts to yahoo groups or blogs that cater to such discussions.

    – Bj –

  5. Shrini says:

    BJ,

    Can you explain some detail about mathematics behind ECP. I believe, data variable in ECP (input or output) is mathematically modelled as a “set”. I read about “equivalance relations” in sets here ..

    http://www.iscid.org/encyclopedia/Equivalence_Relation

    In this article the author talks about equivalance relationships between various elements of a set in terms of reflexive, symmetric and transitive properties …

    Any views?

    Shrini

    ———————————————————

    Bj’s response –

    The ISCID is the International Society for Complexity, Information, and Design which is primarily an organization of members engaging mainly in philosophical discussions of complex systems.

    The paper you cite by Jimmy Tseng is a discussion of a mathematical concept of relationships, and interestingly enough provides a complex solution that explains why in a set of Latin upper case characters A – Z that

    A is equivalent to A for all A in the set of upper case Latin characters A – Z (reflexive property)

    and that B is equivalent to A and A is equivalent to B (symmetric property)

    and that A is equivalent to B and B is equivalent to C  then A is equivalent to C(transitive property).

    For those who are really interested in an in-depth understanding of equivalence concepts, and why the technique of equivalence class partitioning (ECP) is valuable from a mathematical perspective this is an interesting read.

     

  6. I.M.Testy says:

    See my responses above in the comments

  7. Shrini says:

    BJ,

    This discussion is getting interesting …

    I would like to get further clarifications on …

    [BJ]

    an edit control accepts a string of upper case Latin characters between A and Z with a max length of 25 characters. Also, since I am familiar with ASCII, ANSI, and Unicode encodings, standard string parsing algorithms, and how Windows, Unix, and Mac environments process characters within this range I can logically assert (declare) with a high degree of confidence and probability that any combination of these 26 characters of any length string will produce the same expected result (what ever that result may be, since this was a simple leading example).

    [/BJ]

    You seem to be suggesting that for a field that accepts only upper case character characters [A-Z] there can be only one equivalance class (“A” is same as “ABC” is same as “ABCDEF”) on ASCII/ANSI/Unicode encodings on windows, unix and Mac. Does it mean I can just use one value (from an ECP perspective) and assert that rest of the values do not matter?

    Let us say if I apply this to “google” search feature. If I am restricting search inputs to only [A-Z], can I assert that any string containing upper case latin letters [A-Z] prety much covers the entire domain and declare that it I test with either “A” or “ABC” or “DEFEGERERERREEEE”, I am done with ECP?

    Here is where types of equivalance come into picture. Your assertion that “A” or “ABC” or “ADCNERR” are identical is based your knowledge about “data types”, “platforms” encoding schemes etc. But this assertion needs to be supplimented by “Functional Logic” based equivalance (what kaner calls “specified” equivalance). Google may treat “A” and “Z” in different ways depending upon its processing logic of business rules implemented at code.

    Now bit detail about what constitutes input and output that help me to articulate why I am having challenges in understanding simple input and output concepts.

    Doug Hoffman presents a diagram where he depicts “how a system can fail” – where the concept of “intended inputs” and “monitored outputs”. This diagram helped me to understand when some one says input it is not only the “explicit input” that user supplies to the program the entire state of system and platform and related entities forms the “complete” input. Similarly there can be multiple outputs in response to an explicit input. In nutshell when software is operation, there are infinite inputs that software uses and responds (continuously) with infinite out puts. For the sake of simplicity, testers model the system and choose to focus on few inputs and monitor few outputs.

    It is in this light, a software application response/behaviour to a set of input is a complex thing. If I treat “AB” and “ABCEDEDD” as identical inputs for a program – I am highly simplifying the whole scenario. This is because there are multiple outputs possible for these inputs and for some of the outputs, the program might be behaving “real” differently.

    My intentions of post this comment is to express my opinion that the analysis that follows ECP(the wrapper) is deeper than what is being depicted (A string of upper case latin characters [A-Z]of any length is treated identically on Windows/Mac/Unix) and one would need to refine simple analysis with application specific logic (that is another dimension to ECP – “Functional logic equivalance”)

    ——————————————————————————

    Shrini,

    You are correct. As I said in the article, and as I stated before, there is not a one size fits all equivalence class. In my simple example my context is an input textbox that simply parses a string of chracters.

    When you change the context of the test to evaluate the output results of a search algorithm then the equivalence class partitions become very different as well.

    This is not a different ‘type’ of equivalence class, it is simply a different partition of data based upon the testing hypothesis.

    Doug Hoffman’s example is good, but it does not relate to the application of equivalence class partitioning. ECP is a systematic procedure to evaulate the functionality of a discrete input or output parameter in isolation. Perhaps you can think of it as a low-level test or a type of unit test applied to each parameter. The technique is not necessarily intended to evaluate combinations of input parameters applied to various state machines. This is a different type of testing.

    It appears we have the same intentions. I used a simple example of string parsing to explain the basic concept of ECP, followed by a relatively more complex example of ECP in the article. I also stated several times that the pracctical application of ECP is much more difficult and is heavily dependent on the skill and knowledge of the professional tester.

  8. Shrini says:

    >>I am sad that you assume my article seems to glorify or oversell equivalence class partitioning. My goal was the opposite. I had hoped that readers would understand that this functional technique was not simple, or an easy application of some rote process.

    I appreciate that you published this comment and opened the thread to express/clarify my views.

    I see the whole discussion around ECP in two parts. One – a highly simplistic concept (at logical level)of ECP that there are classes of data that can be assumed to be treated identically and the other – a in depth analysis (that has nothing to do with the concept of ECP)that follows of identifying the data space, modeling it as a domain, analysing the data space, hypothesizing etc.

    My view is that it is the second part that needs more attention (testers experience as you refer) than the first. You seem to be mixing both and say that the concept of ECP and analysis that follows it as “inseperable”. This is where I and you differ.

    Let us agree to disagree.

    >>>As I read through this comment I almost hesitated to publish it because although I have never met you, I think you are much brighter than to suggest such simple-minded rehtoric.

    Thanks for publishing the comment. We have exchanged views/mails/comments so many times – we meeting is just a formality. This is not meant be a rhetoric but explaination of how I view the whole thing. If you find it simple enough – it is fine. Sometime it helps to revisit somethings that are taken for granted. I just thought I would make myself clear.

    Shrini

    ———————————————————-

    Sure we can agree to disagree if you’d like; however, the ultimate effectiveness of the ECP technique is directly related to the professional testers ability to decompose and peform an in-depth analysis of the data within the context of the functionality of a discrete parameter. Thus, it has everything to do with ECP which is another point discussed in the article.

    The inability of the tester to decompose input and output data into discrete subsets simply leads to guessing and (usually) inadequate testing. Individuals who are only capable of what Kaner refers to ‘subjective equivalence’ are merely playing or guessing. My mother often does things she assumes to be equivalent that sometimes results in different behavior and sometimes even a bug! Thus, even my mother finds (stumbles upon) bugs, but my mother is not a tester.

    – Bj –

  9. Shrini says:

    >>>I don’t know what you mean by ‘universal equivalence

    By Universal equivalance – I mean a notion of equivalance that can be arrived at WITHOUT any reference to the software system that is under test, or spefication that we might use or the programatic implementation (language,data type and platform). Mostly, all the examples in ECP quoted in text books and testing literature use this kind of equivalance.

    A context free (Application logic, Programatic Implementation, and Usage profile are some key contexts that are important) discussion about Eq classes given a field or variable is called as “Universal Equivalance”

    >>> Over the next few weeks I will discuss the application of ECP applied to COMDLG32.DLL, and perhaps that will provide a greater understanding of this technique.

    That would be really great. I eagerly look for that

    Shrini

    ——————————————————————-

    Actually, Beizer, Myers, Jorgensen, and others assume that the readers of their books have an understanding of programming concepts and are familiar with computer systems.

    It is only some of the more recent books that discuss ECP and other techniques from an overly simplistic perspective. The primary audience of these books is novice testers, or individuals who lack a strong math, engineering, or computer science background.

    It is a rediculous assumption to think that a person can reach some logical, and rational hypothesis or design a test “without any reference to the software system that is under test!”

    Of course, perhaps that is the difference between the professional tester who has an understanding of programming concepts, and in-depth domain and system knowledge, and those who dabble in the discipline of testing and simply guess at what to do next based on the behavior of the software system under test (which to me is mostly irrational behavior) and hope to find a bug.

    Similarly, a context free discussion about EQ classes for a given parameter is simply wild speculation without relative value towards the purpose.For example, in a context free discussion of my overly simple input parameter that takes only 25 upper case Latin characters between A and Z I could suggest laying a brick on the escape key for 25 minutes may be something to ‘try’. It simply adds no value to an analysis of the parameter from an ECP perspective.

    – Bj –

  10. Weddings says:

    I have been teaching formal testing techniques for several years at Microsoft and University of Washington Extension. Techniques are systematic procedures to help solve a complex problem. A technique does not find all types problems; techniques are generall

Skip to main content