A better approach to test data generation

Anytime you are doing any sort of serious database development, you’ll need realistic test data to verify your system. There are several commonly used approaches to test data generation:


§         Use production data for testing purposes. Typically you do this either by using old production data or by transforming production data so as to replace actual values with somewhat-equivalent values. The advantage of this approach is there is nothing more representative than your production data. Of course the big downside here is that, well, you are using production data. This clearly has large privacy implications, especially in today’s highly regulated environment.


§         Come up with test data from scratch. Oftentimes when you are working on a new development project, there will of course not be any production data to mimic. While this approach is free of all the privacy issues, it is very painful to actually come up with test data easily that actually mimics of your production environment.


Now, let me offer you a better approach to test data generation. That is to use the new test data generation tool that ships with Team Edition for Database Professionals.


The premise of this tool is centered around approach #2 – building test data from scratch. We went down this path because we wanted to completely avoid the privacy\regulatory issues of munging production data. The power of this tool is in its ability to efficiently generate test data from scratch that is representative of your production environment. We facilitate this in several ways.


§         Highly configurable value generators. Each table column can be assigned a value generator. We have value generators for each column data type. Each generator has a set of properties which let you fully control the parameters of the generated values.


§         Additional powerful generators. These include a regular expression generator, which generates values based on the specified regular expression (think phone numbers), as well as a data bound generator, which allows you to pull values for data generation from a known data source.


§         Smart default assignment of generators. We automatically assign by default the appropriate value generator to each column based on the column’s data type. This way as soon as you use this tool, you are generating values of the appropriate data type.


§         Recognition of database constraints. We understand some of your database constraints and automatically configure the assigned generator to adhere to these constraints. For example, if your column has defined on it a check constraint that specifies a column must be greater than some value, we will configure the minimum value for data generation to be greater than that value.


§         Support for data distributions. We provide the ability to apply a distribution on a value generator to generate values according to the distribution. For example, you can generate values according to a normal curve, or say an exponential curve.


§         Generator & distribution extensibility. We allow you to extend the library of generators and distributions to include any kind of value generation that would be appropriate for your test data needs.


§         Data generation is repeatable. The values generated are based on a seed value, ensuring that the data generated is repeatable. This makes it ideal for use with database unit testing.


§         Enforcement of table ratios. We provide the ability for you to model certain relationships between your tables and enforce specified ratios between them. For example, lets say you had an Auctions and Bids table in your database. You can setup a 1:25 ratio between the two tables such that for every auction generated, 25 associated bids will be generated.


Are you convinced yet that this is a better approach? 🙂 If so, please try out the tool! If not, also check out the tool and hopefully then you’ll understand.


It’s available here:



And of course, I’d love to hear your feedback. Any thoughts you have on the functionality would be greatly appreciated.


In future posts I’ll be drilling into specifics, even showing you how to create your own custom generator! So stay tuned...


Sachin Rekhi

Comments (6)

  1. One important consideration in database unit testing is managing the database state. This aspect of database…

  2. Rich Alberth says:

    I've been working on a similar tool to generate test data, but DataRoller comes at the problem from the JAva space.  If you're interested, take a look at DataRoller – a new project just released at dataroller.sourceforge.net.

    From the user manual: "DataRoller is a Java application that will insert piles of good looking data into your favorite databases so you can move in and be happy."

    DataRoller inserts rows into a database based on a file describing your tables and columns.

    Sample DataRoller file to populate an invoices table:

    table invoices

       insert 4000 rows


       invoice_id    sequence(),

       alt_key_uid   guid(),

       name          lorem(20..80),

       label         pattern("UU-NNNN"),

       status        choice("UNK","SHP","CLS", "OPN","INP")


    Run this script via the DataRoller program and it will populate the invoices table with 40,000 rows of data.

  3. Kfr Janssens says:

    Since some time I am working on a test data generation tool aimed at testers and QA people, not aiming at technical DBA's and developers like this one.

    The advantage over other tools is that it uses the javascript engine from the browser to generate lots of data, so you don't need to install any software.

    Any comments or suggestions are welcome.

    http://tedagen.com – TEst DAta GENerator

  4. Lbien says:

    http://www.yandataellan.com … Best data generator tool and 100% free. Generate up to 10.000 rows of data in several formats (CSV, Excel, SQL, JSON, HTML, and XML). Take a look…

  5. Lbien says:

    http://www.yandataellan.com … Best data generator tool and 100% free. Generate up to 10.000 rows of data in several formats (CSV, Excel, SQL, JSON, HTML, and XML). Take a look…

Skip to main content