Creating Complex Test Databases - Creating a Database with 1 Billion Random Rows

As part of my series on creating databases with interesting characteristics for testing purposes, today we'll create a database containing a large number of tables, each with a large number of rows inserted.

Goal: Create a database with ONE BILLION ROWS. The database should contain 1,000 tables with each table containing 5 integer columns and 1,000,000 random rows of data.  

1,000 tables x 1,000,000 rows/table = 1,000,000,000 rows of data

One billion rows?  Really?  Yes, 1 billion rows!  And random data, please.

 

For this task, we can break it down into two steps.  The first is to create the 1,000 tables, which we can easily accomplish through a TSQL script.  The next part is to populate to the tables with data.  For this task, we will leverage the Data Generation tool in Visual Studio 2010.

Task 1: Create 1,000 tables, each with 5 int columns

Create an empty database, open a new query connected to that database, and run the following TSQL script:

DECLARE @counter INT
DECLARE @tablename VARCHAR(100)
SET @counter = 0

WHILE @counter < 1000 -- change this parameter value to create the desired number of tables

BEGIN

SET @counter = @counter + 1
SELECT @tablename = 'table' + cast(@counter as CHAR)
EXEC('CREATE TABLE ' + @tablename + '(
[int1] [INT] NULL,
[int2] [INT] NULL,
[int3] [INT] NULL,
[int4] [INT] NULL,
[int5] [INT] NULL
)')

END

Keep in mind that you can easily edit this script to change data types, table relationships, etc., based on your testing needs.

 

Task 2: Insert data into our tables with Visual Studio 2010's Data Generation feature

Launch Visual Studio 2010 and create a new project.  (New Project -> Database -> SQL Server -> SQL Server 2008 Database Project)

In Solution Explorer, right click on the project name and "Import Database Objects and Settings"

Click on New Connection and supply the server name, authentication, and database name that you created in step 1.

Click Start to import the DB objects.

When finished, right-click in Solution Explorer on Data Generation Plans and Add -> New -> Data Generation Plan.

This will load the database schema into Visual Studio.  The default for the data generation tool is to insert 50 rows into each table, but this completely configurable.  You can set different values for the rows to insert, create related tables for PK/FK constraints, generate different data types or data bound values, and many other nice features. 

For our exercise, we are going to modify the plan to generate 1,000,000 rows per table instead of the default of 50.  To do this, save your plan and close it in VS.  Right-click on the plan file in Solution Explorer and Open With -> Source Code (Text) Editor to get a text version making the replacement easier.  Perform a Find and Replace to update the NumRows values to our desired value of 1,000,000.
 
Find:  Name="NumRows" Value="50"
Replace with:  Name="NumRows" Value="1000000"

Save and close the text view and double-click on the .dgen file again to open the standard view.  Here you will see the Rows to Insert was updated to 1,000,000 for each table.

Now we're ready to perform the data generation. Click on Data -> Data Generator -> Generate Data from the toolbar.  Select the database we just created in step 1 and click OK.  A dialog will ask if you want to delete existing data in the tables before generating.  Since our table is empty, we can say no.  If you play around with the tool, you can use this feature to clean up the tables before inserting.

Generating 1 billion integer rows in this exercise requires roughly 28 GB of free space and 5-10 minutes of time, so make sure you plan accordingly.

Hope you enjoy,
Sam Lester (MSFT)