Creating Complex Test Databases - Intro

As a very brief intro, I have worked as a tester in SQL Server for the past 10+ years on many different features.  Along the way, we develop and test features and release them to the public only to discover some customers inevitably encounter bugs when they run the features against their databases.  How can this happen when we have amazing PMs and developers, devoted and talented test teams, and thousands of automated test cases per feature?  The answer often lies in the incredible complexity of customer databases running on SQL Server and the evolution of those databases as they have grown from small to very complex databases over the years.  As testers, we have a few different options to try to mitigate this problem and represent "all possible databases" in our testing, but it is impossible to test every possible permutation of databases based on this complexity.  In practice, we do all of these to an extent and are constantly working on improving each of them. 

Some options are to:

  • Acquire real customer databases - these are often the best test databases, but pose many challenges to acquire due to size, network, security, PII, NDAs, etc.  We often work with our internal Microsoft product teams who run large scale database applications to leverage their DBs.  (Dreaming out loud:  I'd love to try to work with customers to figure out a way to get more "privacy scrubbed" customer DBs into our test environment.  Microsoft products get better coverage, the customer applications are guaranteed to work, and we all win.  I'll blog more on this later, but send me a message if you're interested in working together to get your scrubbed databases in our test bed.)
  • Programmatically write tools that can create many permutations of databases with various objects, properties, relationships, etc.  Feed various inputs into this tool to create different test databases.  We have had pretty good success with this model as we're able to use many smart testing techniques and create some great test databases that uncover some great bugs.
  • Maintain a database of "interesting" syntax and write automated data-driven test cases based on this object level syntax.  As we encounter any new bug, distill the bug down to the problematic syntax and add that to our existing syntax database.
  • Handcraft complex databases with very specific requirements based on the testing needed for a particular feature/sign-off.

The last option (handcrafted databases) are often our last resort, but result in the most effective method for ensuring that specific features work for specific test cases.  Our dev, test, and PM team spent some time recently for a feature we are working on to come up with the list of "complex" databases that we do not have in our test environment, but would like to add.  Over the next few blog posts, I'll cover some of the interesting databases and the techniques I used to create them.  Here are a few of the DBs we had in mind:

  • DB with a large number of tables (20k), each with 20 columns and 200 rows of data
  • DB with a single table, large amount of data (100 GB+)
  • DB with tables containing all possible data types (1 table with NULLABLE, 1 table with NOT NULLABLE)
  • DB with 1 table per possible collation (~2400 of them).  Collation can be set at many different levels in the DB and creates different challenges at each. 
  • DB with many other interesting properties
    • using column set / sparse columns to exceed the 1024 column limit per table
    • testing the limits of the sql_variant data type by creating data of all types supported by sql_variant, including data greater than 8k (supported page size)
    • tables consisting of UDTs
    • tables consisting of UDDTs
    • tables with varbinary(max) data > 64 MB

Keep in mind that there are MANY more interesting DBs based on scale, max limits, relationships, etc., but for our intended goal, this list covers the P1 cases. 

Hope you enjoy,
Sam Lester (MSFT)