Random string generation…Update!

Article
02/17/2009

One of the biggest challenges in input testing is the sheer amount of potential characters and the virtually infinite number of permutations of those characters in different character positions in a string. Even if we know about the myriad of language scripts used throughout the world, manually generating characters from multiple language groups would be excruciatingly inefficient.

Since any modern application should support Unicode character we can assert the strings “abcdefg” and “ڄƥ藖꼩昨”are equivalent for most input testing requiring a Unicode string. So, random string test data generation is useful for easily increasing the breadth of test data tested, and also for testing the robustness of the applications ability to process complex data streams.

Babel 2.0 is a free test tool, and one of the few random string generators that can generate a string of character across the entire Unicode spectrum, since its initial release in 2006 it has been widely popular. So, I am happy to announce that an updated Babel 2.0 is released! I know this constitutes a shameless plug…but, sometimes it helps to plug tools we’ve made that can benefit other testers or developers.

Unlike many string generators that only produce a string of random ASCII characters, Babel can produce a string of random Unicode characters defined in the Unicode 5.1 specification, including surrogate pair characters (which often expose problems in various text boxes…hint, hint). Additional updates to Babel 2.0 include:

Updated to the Unicode 5.1 spec (including new script groups and character code points)
Ability to include/exclude combining character code points
Ability to include/exclude reserved NetBIOS characters
Custom range allows character generation from 0x01 through 0xFFFF.
Ability to generate strings with a max length of 100,000 characters
Improved distribution of characters from the selected language script groups

The following illustration provides a basic flow diagram of how Babel generates random strings. Essentially, one script group is randomly selected from all selected script group nodes, and all code points assigned to that script group are put into a collection. Next, one character is randomly selected from that collection and is appended to a string. This process continues until the string length equals a specified number of characters.

Better distribution of character selection across multiple script groups occurs by preventing the same script group from being selected before at least ½ of the other specified groups are selected. This means that as long as more than one script group node is selected the selected group of characters will be removed from the random selection process until at least half of the other script groups are chosen. This provides a greater distribution as compared to simple random generation.

The download also includes the Babel.DLL (and the dependent UnicodeData.DLL) for test automation. The older methods are deprecated and no longer supported. The new methods have been simplified and now include:

public static string Polyglot (int, int, bool, bool, bool, bool, bool)
Returns a string of random Unicode characters in all Unicode script groups based on a specified seed value.

public static string Polyglot (int, bool, bool, bool, bool, bool, out int)
Generates a random seed value and returns a string of random Unicode string of characters in all Unicode script groups, and passes a reference to the seed value.

public static string Polyglot ( int, int, bool, bool, bool, bool, bool, char, char)
Returns a string of random Unicode string of characters in all Unicode script groups based on a specified seed value

public static string Polyglot (int, bool, bool, bool, bool, bool, char, char, out int)
Generates a random seed value and returns a string of random Unicode string of characters in all Unicode script groups, and passes a reference to the seed value.

Get the new release of Babel 2.0 !

Random string generation…Update!

Additional resources