Equivalence class partitioning - Part 2: Character/String data decomposition

Article
11/15/2007

Again, I am remiss in my postings...too many irons in the fire these days. Two weeks ago, I posted a challenge to decompose a set of character data (The ANSI Latin 1 Character Set) into valid and invalid equivalence class subsets in order to test the base filename parameter of a filename passed to COMDLG32.DLL on the Windows Xp platform from the user interface using the File Save As... dialog of Notepad.

As illustrated below the filename on a Windows platform is composed of two separate parameters. Although the file name parameter of the Save As... dialog will accept a base filename, a base filename with an extension, or a path with a filename with or without an extension, the purpose of the challenge was to decompose the limited set of characters into equivalence class subsets for the base filename component only (the part outlined with green). (Of course, complete testing will include testing with and without extensions, but let's first focus on building a foundation of tests to adequately evaluate the base filename parameter first, then we can expand our tests from there to include extensions.)

As suggested in the earlier post, in order to adequately decompose this set of data within the defined, real world context (and not in alternate philosophical alternate universes) a professional tester would need to understand programming concepts, file naming conventions on a Windows platform, Windows Xp file system, basic default character encoding on the Windows Xp operating system (Unicode), some historical knowledge of the FAT file system, and even a bit of knowledge of the PC/AT architecture. The following is a table illustrating how I would decompose the data set into equivalence class subsets.

Input/OutputParameter

Valid Class Subsets

Invalid ClassSubsets

Filename

V₁ – escape sequence literal strings (STX, SOT, ETX, EOT, ENQ, ACK, BEL, BS, HT, LF, VT, FF, CR, SO, SI, DLE, DC1, DC2, DC3, DC4, NAK, SYN, ETB, CAN, EM, SUB, ESC, FS, GS, RS, US, DEL)

V₂ – space character (0x20) (but not as only, first, or last character in the base file name)

V₃ – period character (0x2E) (but not as only character in the base file name)

V₄ – ASCII characters punctuation (0x21, 0x23 – 0x29, 0x2B – 0x2D, 0x3B, 0x3D, 0x40, 0x5B, 0x5D, - 0x60, 0x7B, 0x7D, 0x7E) numbers (0x30 – 0x39)

alpha (0x41 – 0x5A, 0x61 – 0x7A)

V₅ – Ox80 through 0xFF

V₆ – 0x81, 0x8D, 0x8F, 0x90, 0x9D

V₇ – Component length between 1 – 251 characters (assuming a default 3- letter extension and a maximum path length of 260 characters)

V₈ – Literal string CLOCK$ (NT 4.0 code base)

V₉ – a valid string with a reserved character 0x22 as the first and last character in the string

I₁ – control codes (Ctrl + @, Ctrl + B, Ctrl + C, Ctrl + ], Ctrl + N, etc.)

I₂ – escape sequence literal string NUL

I₃ – Tab character

I₄ – reserved words (LPT1 – LPT4, COM1 – COM4, CON, PRN, AUX, etc.)

I₅ – reserved words (LPT5 – LPT9, COM5 – COM9)

I₆ – reserved characters (/ : < > | ) (0x2F, 0x3A, 0x3C, 0x3E, 0x7C) by themselves or as part of a string of characters

I₇ – reserved character 0x22 as the only character or > 2 characters in the string

I₈ – a string composed of > 1 reserved character 0x5C

I₉ – a string containing only 2 reserved characters 0x22

I₁₀ – period character (0x2E) as only character in a string

I₁₁ – two period characters (0x2E) as only characters in a string

I₁₂ – > 2 period characters (0x2E) as only characters in a string

I₁₃ – reserved character 0x5C as the only character in the string

I₁₄ – space character (0x20) as only character in a string

I₁₅ – space character (0x20) as first character in a string

I₁₆ – space character (0x20) as last character in a string

I₁₇ – reserved characters (* ?) (0x2A, 0x3F)

I₁₈ – a string of valid characters that contains at least one reserved characters (* ?) (0x2A, 0x3F)

I₁₉ – a string of valid characters that contains at least one reserved character 0x5C but not in the first position

I₂₀ – string > 251 characters

I₂₁ – character case sensitivity

I₂₂ – empty

Discussion of valid equivalence class subsets

Valid subset V₁ is composed of the literal strings for control characters (or escape sequences) between 0x01 and 0x1F, and including 0x7F. The literal strings for control characters may cause problems under various configurations or unique situations. The book How to Break Software: A Practical Guide to Testing goes into great detail explaining various fault models for these various character values. The literal strings in this subset should be tested as the base filename component and possibly in a separate test as an extension component. However, on the Windows platform the probability of one particular string in this subset behaving or being handled differently than any of the others is very low negating the need to test every string in this subset; although the overhead to test all would be minimal and once complete would not likely require repeated testing of all literal strings in this subset during a project cycle.
Valid subset V₂ provides guidance on the use of the space character in valid filenames. On the Windows operating system a space character (0x20) is allowed in a base filename, but is not permitted as the only character as a file name. Typical behavior on the Windows platform also truncates the space character if it is used as the first character of a base filename or the last character of a base filename. However, if the extension is appended to the base filename in the Filename edit control on the Save or Save As… dialog a space character can be the last character in the base filename. Also note that a space character by itself or as the first character in a filename is acceptable on a UNIX based operating system. Also, although we can force the Windows platform to save a file name with only a space character by typing “ .txt” (including the quotes) in the Filename edit control on the Save/Save As… dialog this practice is not typical of reasonable Windows users’ expectations.
Valid subset V₃ is the period character (0x2E) which is allowed in a base filename, but it is not a valid filename if it is the only character in the base filename (see Invalid subset for the period character).
Valid subset V₄ is composed of ‘printable’ ASCII characters that are valid ASCII characters in a Windows filename. The subset includes punctuation characters, numeric characters, and alpha characters. We could also decompose this subset further into additional subsets including valid punctuation characters, numbers, upper case, and lower case characters if we wanted to ensure that we test at least one element from the superset at least once.
Valid subset V₅ is the set of character code points between 0x80 and 0xFF.
Valid subset V₆ is a superset of subset V₅ and are separated only because they are code points that do not have character glyphs assigned to those code point values. These would be interesting especially if we needed to test filenames for backwards compatibility on Windows 9x platforms.
Valid subset V₇ is the minimum and maximum component length assuming the filename is saved in the root directory (C:\).
Valid subset V₈ is a probably a red-herring. On the NT 4 platform the string CLOCK$ was a reserved word. On an older application first created for the Windows NT 4 platform that does not use the system Save/Save As dialog we might want to test this string just to make sure the developer did not hard code the string in an error handling routine.
Valid subset V₉ is an interesting case because this invalid reserved character (0x22) is handled differently when used in first and last character positions of a base filename. When used in the first and last positions of a base filename the characters are truncated and if the remaining string is valid the filename is saved. If only one 0x22 character is used, or if two or more 0x22 characters are used in a string other than the first and last character positions the result will be an error message.

Discussion of invalid equivalence class subsets

Invalid subset I₁ consists of the control code inputs for escape sequences in the range of 0x01 through 0x1F, and also includes 0x7F. Pressing the control key (CTRL) and any of the control codes keys will cause a system beep.
Invalid subset I₂ is the literal string “nul”. Nul is a reserved word but could be processed differently than other reserved words on the Windows platform because it is also used in many coding languages as a character for string termination.
Invalid subset I₃ is the tab character which can be copied and pasted into the Filename textbox control. Pasting a tab into the and pressing the save button will generate an error message.
The invalid subset I₄ includes literal strings for reserved device names on the PC/AT machine and the Windows platform. Using any string in this subset result in an error message indicating the filename is a reserved device name.
Invalid subset I₅ also includes reserved device names for LPT5 – LPT9 and COM5 – COM9. However these must be separated into a unique subset because using these specific device names as the base filename on the Windows Xp operating system result in an error message indicating the filename is invalid.
Invalid subsets I₆, I₇, and I₈, include reserved characters on a Windows platform. When characters in this subset are used by themselves or in any position in a string of characters the result is an error message indicating the above file name is invalid.
Invalid subsets I₉, I₁₀, I₁₃, also include reserved characters and the space and period characters. When these subsets are tested as defined no error message displayed and focus is restored to the File name control on the Save/Save As… dialog.
Invalid subsets I₁₁, I₁₂, also include the reserved character (0x2E) as 2 characters in the string and greater than 2 characters in a string. The state machine changes are different.
Invalid subsets I₁₅ and I₁₆ define the space character when used in the first or last character position of a string. These are placed in the invalid class because Windows normal behavior is to truncate a leading or trailing space character in a file name. If the leading or trailing space character was not truncated and saved as part of the file name on a Windows platform that would constitute a defect.
Invalid subset I₁₇ and I₁₈ contains two additional reserved characters; the asterisk and the question mark (0x2A and 0x3F respectively). If these characters are used by themselves or as a character in a string of other valid characters a file will not be saved, and no error message will occur. However, the state of the Save/Save As… dialog does change. If the default file type is .txt and there are text files displayed in the Folder View control on the Save As… dialog the files with the .txt extension will be removed after the Save button is depressed. If the default file type is All files then all files will be removed from the Folder View control on the Save As… dialog after the Save button is depressed.
Invalid subset I₁₉ is a string of valid characters which contains at least backslash character except as the lead character in the string. (Of course, this assumes the string is random and the position of the backslash character in the string is not in a position which would resolve to a valid path.) The backslash character is a reserved character for use as a path delimiter in the file system. An error message will appear indicating the path is invalid.
Invalid subset I₂₀ tests for extremely long base file name lengths of greater than 252 characters. Note that an interesting anomaly occurs with string lengths. A base file name string length which tests the boundaries of 252 or 253 valid characters will cause an error message to display indicating the file name is invalid. However, a base file name string length of 254 or 255 characters will actually get saved as file name but is not associated with any file type. Any base file name string longer than 255 characters again instantiates an error message.
Invalid subset I₂₁ describes the tests for case sensitivity. The Windows platform does not consider character case of characters that have an upper case and a lower case representation. For example, a file name with a lower case Latin character ‘a’ is considered the same as a file name with the upper case Latin character ‘A’.
Invalid subset I₂₂ is, of course, an empty string

Of course, this is a partial list of the complete data set since the filename on a Windows Xp operating system can be any valid Unicode value of which there are several thousand character code points, including surrogate pair characters.

The first and by far the most complex step in the application of the functional technique of equivalence class partitioning is data decomposition. This requires an incredible amount of knowledge about the system. Data decomposition is an exercise in modeling data. The less one understands the data set, or the system under test the greater the probability of missing something. Next week we will analyze the equivalence class subsets to define are baseline set of tests to evaluate the base filename component.

Equivalence class partitioning - Part 2: Character/String data decomposition

Discussion of invalid equivalence class subsets

Additional resources