Testing DBMS system: what should be the language?

   Yesterday, I had chat with a colleague about  testing database system.  He raised a ery interesting question: what should be your language? 

   Any test can be write in the ways of: Test(X), while X is the "thing" we want to test or verify. In additional, a test need 1) SUT: system under test,  it is the SQL Server in our engine testing case  2) pre-condition, it is the state of the SUT before testing. We may create some pre-defined objects or change some SQL Server setting before tests. These actions are the setup step which change the SUT to a state which we want to run test against.  3) post-condition, it is the state of SQL Server after the tests.  We can check the post-condition to verify whether the test pass/failed.  4) input variable. It is the input we called into SUT which trigger SUT do some actions internally, and change it's state.  T-SQL script is the straight-forward input variables for SQL Server case. Other input variables can be using SMO programming interface,  calling specific UI tool to do some action, etc.
   X, the "thing" we want to test is different with the "input variable". X can be a join of two tables, X can be a feature, such as data compression or X can a SQL Server's setting.  SQL Server itself is not X, but SUT.
   until now, T-SQL script is still our main language of writing SQL Server Engine tests. So why not use T-SQL as the language of "X"?  Using T-SQL as the language as several benefits:

  1. We use the same language as our customer does.
  2. T-SQL is the interface of SQL Engine, and nearly all features are exposed through T-SQL Script.

  What is the disadvantage of using T-SQL script? 

  1. The syntax of T-SQL is subject to change and it is sensitive to the context, so we will have high chance of not being to reuse the tests once the context is changed.  For example, when we introduce new table options, such as table compression. Existing tests can not test this feature without changing the creating table statement. 
  2. Our tests will tie to the product we are testing against. It looks good, but it is actually bad?  We will unable to re-use these tests for testing Matrix, Cloud-DB or column store. We will have high migration cost of migrating these tests.

 So what is the possible language we can use to describe our database tests?  I don't have an answer, but I will list several possible ways:

  1. Objects.  Can we use PIMOD tables, columns which correspond to SQL Server's table, columns.  Question, are these objects are the scenario we are testing against or the input values of our tests or testing demension?
  2. Other high level language, such as LinqToSQL or EDM. Question, can these languages represent all features SQL Server provided? these features internally transform the language into T-SQL language, can we assume the transformation is correct?
  3. Relational Algebra.  I think it is the best abstract language which can describe "any" SQL language (it is product independent). Question, who will write Relational Algrbra to T-SQL transformation, whether we need SQL Server specified algrebra to decribe the new features.
  4. SQL Server's parse/algrbrea tree.  Question, it is SQL Server's internal struture, we might not be able to use externally.
  5. PIMOD_Language.  I think we have possiblity to use it to represent the query tree what we want to be tested with.  Question,  PIMOD_Language has two parts, abstract representation of the query we want to tested, and the transform from the tree to T-SQL.  How hard to separate these two thing?
  6. Other declartive language?