Randomizing static test data in automated tests

Article
10/10/2009

A significant percentage of static test data is stored in tabular comma delimited or tab-delimited formats and saved in Excel spreadsheets. Reading in comma or tab-delimited static test data into an automated test is pretty straight forward and there are numerous examples in many programming languages illustrating how to read in these types of test data repositories. Reading in rows of data is the foundation of data-driven automation and definitely has its place in any automation project.

I am a big proponent of stochastic (random) test data generation that is customized to the context, but I also know that sometimes static test data is useful for establishing baselines and more exact emulation of ‘real-world’ customer-like inputs. But, if the automated test is simply passing the same variable arguments to the same input parameters in the same order over and over again the value of subsequent iterations of that automated test using that static data set diminishes rather quickly. So how can we more effectively utilize static test data in our automated tests?

One possible solution is to randomly select an argument from a collection of static variables that is passed to the specific input parameter. The advantage of this approach is that it effectively increases the test data permutations in each iteration of the test case. For example, let’s consider 2 input parameters; one for a given name and one for a surname. In a traditional data-driven approach in which the static test data is read in by rows our test data file might be similar to:

Bob,Smith
John,Johnson
Roger,Williams
Steve,Abbot

This static data file would give us 4 sets of test data, but each time the test data is read into the test case the given and surnames are always the same.

However, if we read in the given names and surnames into 2 collections, and then randomly select a given name and surname from the appropriate collection to pass to the respective parameter we effectively have 16 possible combinations of static test data to work with. An advantage of this approach is that our ‘collections’ of given names and surnames can contain differing numbers of elements (in which case the number of possible combinations of test data is the Cartesian product of the number of elements in each collection).

Of course there are many ways to accomplish this. For example, one approach is to continue to use a comma or tab-delimited file format and list given names in one row and surnames in a second row. Another approach is to list the given names and surnames in columns in a spreadsheet and read in each column into a collection of some sort. The latter is the approach I used in developing my PseudoName test data generator tool. I chose this approach for 2 reasons; first an Excel spreadsheet is a simple yet powerful file format for storing static test data, and secondly because lists of test data are sometimes better represented in columns rather than rows.

The following code shows one way to read in test data by columns from an Excel spreadsheet.

Code Snippet

// <copyright file="datareader.cs" company="TestingMentor">
// </copyright>
namespace TestingMentor.TestTool
{
using System;
using System.Collections;
using System.Globalization;
using System.Runtime.InteropServices;
using System.Threading;
using Excel = Microsoft.Office.Interop.Excel;
/// <summary>
/// This class contains methods for reading test data from Excel spreadsheets
/// </summary>
public class TestDataReader
{
/// <summary>
/// This method reads all the data elements in the specified number of
/// columns in the specified Excel spreadsheet containing the test data
/// and copies the data into a multi-dimensional array
/// </summary>
/// <param name="dataFileName">The filename containing the test data</param>
/// <param name="columnCount">The number of columns in the Excel
/// spreadsheet to read</param>
/// <returns>A multi-dimensional array containing the data eleements for
/// each column </returns>
public static string[][] ExcelColumnReader(string dataFileName, uint columnCount)
{
CultureInfo originalCulture = null;
Excel.Application excelApp = null;
Excel.Workbook excelWorkbook = null;
Excel.Worksheet excelActiveWorksheet = null;
string[][] testData = new string[columnCount][];
originalCulture = Thread.CurrentThread.CurrentCulture;
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
excelApp = new Excel.Application();
excelWorkbook = excelApp.Workbooks.Open(
dataFileName,
0,
false,
5,
String.Empty,
String.Empty,
false,
Type.Missing,
String.Empty,
true,
false,
0,
true,
false,
false);
excelActiveWorksheet = (Excel.Worksheet)excelWorkbook.ActiveSheet;
for (int i = 0; i < columnCount; i++)
{
// Start at column 1
object columnIndex = i + 1;
// Row 1 is the column title; test data starts on Row 2
object rowIndex = 2;
ArrayList tempCollection = new ArrayList();
while (
((Excel.Range)
excelActiveWorksheet.Cells[rowIndex, columnIndex]).Value2 != null)
{
tempCollection.Add(
((Excel.Range)
excelActiveWorksheet.Cells[rowIndex, columnIndex]).Value2);
rowIndex = (int)rowIndex + 1;
}
testData[i] = new string[tempCollection.Count];
testData[i] = (string[])tempCollection.ToArray(typeof(string));
}
// Clean up
excelWorkbook.Close(false, Type.Missing, Type.Missing);
excelWorkbook = null;
excelApp.Quit();
excelApp = null;
// Garbage collection is not pretty, but necessary to release Excel proc
System.GC.Collect();
System.GC.WaitForPendingFinalizers();
if (originalCulture != null)
{
Thread.CurrentThread.CurrentCulture = originalCulture;
}
return testData;
}
}
}

I must tell you that performance can be an issue especially if the columns contain a lot of data. For example, to read in approximately 700 elements of test data in 3 separate columns took slightly less than 1 second, and reading in 1800 elements in 3 columns required just over 4 seconds. Unfortunately, I didn’t compare total byte counts, but it is pretty obvious the greater the number of test data elements being read the longer the read operation will take and you certainly will have to take the read time into consideration in your automated test case.

Reading static test data line by line from a data file while looping through a data-driven automated test case is a useful test design approach in some situations, this is another useful approach that will allow the test designer to randomize the combinations of static test data values applied to multiple input parameters in multiple iterations of an automated test case.

Randomizing static test data in automated tests

Additional resources