A different perspective on random name generation

Article
08/15/2009

My daughter made me laugh today when she offered a bit of her philosophy. She told me that her favorite candy is gummy bears “because gummy bears get stuck between your teeth, and then you can dig out a second helping with your tongue.” I never really thought of it that way, but how many of us have not picked at a piece of licorice stuck between our teeth with our tongue (or a toothpick) and savor that last little bit? Ummm….

Perhaps it is my own twisted logic, but as I started writing this post I thought about my daughter’s predilection for gummy bears and somehow made a connection to static test data used in tests. Static test data that is simply reused over and over in a test is similar to that last little bit of licorice we dig out of our teeth. The last bit tastes just like the first bite, and all the other bites between. This may be good for those who like the flavor of licorice, but it is not so good for hard-coded test data in rudimentary test scripts, especially in automated tests.

If you have followed my posts or my personal website then you know that I am a big proponent of probabilistic stochastic test data (statistically unbiased, parameterized randomly generated test data that is representative of the population of possible inputs for a specific variable). The latest addition to my random test data generator toolbox is PseudoName, a random name (pseudonym) generator library for automated testing.

Before designing and developing PseudoName I researched the plethora of available random name generators currently available because I am not a big fan of reinventing the wheel either. In fact, there are many very good online html based random name generators. For example, Fake Name Generator that not only generates a pseudonym, but also generates an address, phone number, etc. essentially creating a fictitious persona. However, while this tool is useful for manual testing it is not so useful for automated tests. The Automated Testing Institute website provides code samples in VBScript and Ruby for generating random names from a built in collection of names stored in an array. These examples are also useful and the collections can certainly be expanded to include a greater variety of names, but they are still limited in scope.

A common problem that I noticed among all available random name generators is the Romanization (representing a written language with the Latin alphabet) of the pseudonym. Basically this means the random names are always represented with ASCII characters. Romanization may be satisfactory for those who only know the letters “A” through “z” or for those whose eyes glaze when the displayed character glyphs are in a foreign language. But, for those of us dealing with modern software or services that supports Unicode and may be adapted (or localized) or used in different locales where it is important to support the native language we soon realize that Romanization using simple ASCII characters is simply not enough for effective globalization testing.

Unlike most random name generators PseudoName generates a random name (pseudonym) from columns of name data in an Excel spreadsheet. The name data in the Excel spreadsheet is stored as Unicode so the characters can be the same as those used in the desired region or locale. For example to generate a random female Chinese name most name generators would produce a string such as “Dongyi Li.” However, PseudoName can randomly generate a name using Chinese characters such as “冬怡李.” (Actually, Dongyi Li is not a pseudonym. Dongyi is my friend and she was kind enough to produce the Chinese name list of female, male, and surnames, and also helped me with refactoring the code used in this tool. )

The PseudoName library is simple to use in an automated test. The PseudoName members page also includes simple examples, and the NameInfo properties allow customization of the pseudonym output. If additional properties are necessary to generate reasonably realistic names in different locales please let me know. Also, if there is enough demand I might consider slapping on a GUI.)

The format for the Excel sheet is simple. The first column is female names, the second is male names, and the third is surnames. The names listed in the currently available US and Japanese names data files are the most common names in those countries according to census data. The names in the Chinese data file are the characters used for feminine and masculine names, as well as the most common surnames used in China. ( I could really use some help collecting name lists using Unicode character scripts from other countries around the world. If you want to contribute please send me a name list in Excel and I will post it on the tool website for other testers to use. )

A different perspective on random name generation

Additional resources