Getting to the Crown Jewels

The crown jewels of the Database Edition product are the SQL parsers and script generator, these two pieces form the foundation of what the database project system does internally.


The parser, parses T-SQL code and turns it in to a script fragment, where the script generator takes a script fragment and turns it in to T-SQL code. This way we can roundtrip code between the two components. The script generator has the ability to format code on the way out, however there is no mode to preserve formatting or whitespace by the script generator at this point in time.

Per CTP16 of the GDR release the parser and script generator are public classes, so if you have any need to parse some SQL code, here is a solution.

The parser and script generator live in two assemblies. The Microsoft.Data.Schema.ScriptDom contains provider agnostic classes and the Microsoft.Data.Schema.ScriptDom.Sql assembly contain classes for the parser and script generator that are SQL Server specific.

Let’s look at a very simple example on how to use the two components. The following screenshot shows the end result. A simple WinForms application that takes in a piece of T-SQL in this case the create statement of the authors table from the pubs database. The second text box shows the formatted result coming out of the script generator.


Lets digest what was needed to achieve this. I will not go in to the detail of the WinForms code since it is trivial and not that interesting, you can look at the sample code if you need to for that. I am going to focus on the usage of the parser and the script generator code.


The first step is to add the references to the required assemblies:


We first need to add two assembly references two:

  • Microsoft.Data.Schema.ScriptDom.dll
  • Microsoft.Data.Schema.ScriptDom.Sql.dll

Then we add two variables to the WinForms class to hold an instance of the parser and the script generator.

   1: private TSql100Parser _parser;
   2: private Sql100ScriptGenerator _scriptGen;

As you can see the parser and script generator are version specific implementations, there exists a TSql80Parser, TSql90Parser and TSql100Parser, the same hold true for the Sql<>ScriptGenerator classes. Normally you would want to instantiate the classes through a factory pattern, to abstract the version away, but in this simple example we will instantiate the versioned instances of the classes directly.  This happens in the form Load method.

   1: private void Form1_Load(object sender, EventArgs e)
   2: {
   3:     bool fQuotedIdenfifiers = false;
   4:     _parser = new TSql100Parser(fQuotedIdenfifiers);
   6:     SqlScriptGeneratorOptions options = new SqlScriptGeneratorOptions();
   7:     options.SqlVersion = SqlVersion.Sql100;
   8:     options.KeywordCasing = KeywordCasing.UpperCase;
  10:     _scriptGen = new Sql100ScriptGenerator(options);
  11: }

The only thing the parser needs to know is how to interpret string inside double quotes, are they strings or object identifiers? This is indicated when the parser class is constructed.

The script generator has many more options and takes and SqlScriptGeneratorOptions class instance as the constructor argument. The options class lets you define things like the version of the language and how to case keywords for examples. You will also find a set of indention properties that you can use to change the formatting results.

Now we are at the point we can do the hard work. When the user clicks the button the Click method will parse the SQL and script it back out to the second text box if there are no errors parsing the input code.

Lets step through the Click method:

   1: private void button1_Click(object sender, EventArgs e)
   2: {
   3:     textBox2.Text = String.Empty;
   4:     textBox3.Text = String.Empty;
   6:     string inputScript = textBox1.Text;
   7:     IScriptFragment fragment;
   8:     IList<ParseError> errors;
   9:     using (StringReader sr = new StringReader(inputScript))
  10:     {
  11:         fragment = _parser.Parse(sr, out errors);
  12:     }
  14:     if (errors != null && errors.Count > 0)
  15:     {
  16:         StringBuilder sb = new StringBuilder();
  17:         foreach (var error in errors)
  18:         {
  19:             sb.AppendLine(error.Message);
  20:             sb.AppendLine("offset " + error.Offset.ToString());
  21:         }
  22:         textBox3.Text = sb.ToString();
  23:     }
  24:     else
  25:     {
  26:         String script;
  27:         _scriptGen.GenerateScript(fragment, out script);
  28:         textBox2.Text = script;
  29:     }
  30: }

We first define to variables to hold the output of the parse operation, the IScriptFragment will hold the script fragment, this is the most important output of the parser, and it will also output an IList of ParseError objects when the parse operations ran in to problems.

Second we need to make the input to the parse available as a StreamReader, so we are converting the input string in to a StringReader class. Line 11 shows the actual call that parses the input stream and outputs the fragments and the errors.

Then we check if any errors occurred during the parse of the input. If so we iterate through the list of error objects and print out the message and offset information.

When there are no errors, we take the resulting script fragment and feed it back to the script generator. Line 27 shows the actual call. The output will be a formatted script.


You can download the sample code from the DataDude SkyDrive: 


Neat he? I hope this is useful!


Comments (13)

  1. onovotny says:

    The classes may be public, but what is the redist limitations on the assemblies?  I would assume that they’re not redistributable and that any app that uses them would only run on a machine with Team Database on it…

    I would like to be wrong though :)

  2. Data Dude – if you know about these 2 words then you know what I’m talking about. Visual Studio Team

  3. onovotny says:

    Thanks, this is really cool and useful stuff.  I would like to request that the type, TSqlFragmentVisitor, be made public too as that will make it much easier for us to write our own visitors on top of the sql expression tree.  

    Also, While most of the sql expression tree is serializable, it appears that one of the member variables, which is a list of SqlParserTokens is not (the SQL Parser Token isn’t marked as serializable).  Could this be fixed?

    The ability to serialize a SQL Expression tree would be very useful.

    One thing that I can see a use for is a set of factory methods built on top of the expression tree such that statements could be more easily constructed in code (instead of parsed from a string).  Very much like SqlOM (  We’d like to replace SqlOM with these libraries once RTM happens…  

    In our case, we prefer SqlOM as a way of generating "safe" queries from a client machine and then sending the object graph on the wire for our servers to parse and extract a safe select query (to prevent injection, etc).

    Another interesting use would be a way to translate the sql expressions (or at least the DML part) into a standard .NET Expression Tree (and vise-versa).  

    Very cool stuff…

  4. Kirstin Juhl on Everything old is new again Martin Hinshelwood on Heat ITSM GertD on Getting to the…

  5. Data Dude says:

    Finally the moment is there, the final version of the Visual Studio Team System 2008 Database Edition

  6. Ca n’est pas encore Visual Studio 2010, mais pour la partie base de données, on s’en approche à grand

  7. [ Nacsa Sándor , 2009. február 8.] Ez a Team System változat az adatbázis változások kezeléséhez és teszteléséhez

  8. cmille19 says:

    Thanks for sharing. Based on your example  I created Powershell cmdlet implementations!EA42395138308430!296.entry

  9. chhapia_chintak says:

    Is there a any way to preserve comments

  10. says:

    Hi all,

    I use Oracle. Is it possible use it ?? Thanks.

  11. kiquenet says:

    Any script generator for Oracle that can I use with VS Database Edition GDR ?? thanks