When to use SQL Server 2005's XML data columns

SQL Server 2005’s support for XML has improved substantially. The new internal representation of XML and XML indexes supports much faster queries and updates. SQL Server 2005 supports querying and updating XML with XQuery, a powerful second-generation DML language for XML. Finally, SQL Server provides good support for converting between and relating relational and XML based data. Now that XML data has such good support in, the choice of when to use an XML column and when to use standard relational structures becomes more difficult.

In my experience, the choice between XML data columns and standard relational representations comes down to the structure of the relationships between objects (i.e. entities) in the data. Structural aspects of the data like how strongly hierarchical, ordered, and known beforehand affect the relative queriablility and update performance of XML columns and relational structures. The structure of data will often vary across the schema, so I mix XML and relational representations as appropriate. XML columns are generally best for data with unknown structure or that is strongly hierarchical or ordered. Relational structures are generally best for highly structured data with variable navigation paths (i.e. join paths).

SQL server makes it easy to convert between relational and XML representations of data. But if you never query the XML data in the database you should probably just store it using the new nvarchar/varchar/varbinary(max) types as this will reduce the cost of converting to and from the internal XML representation.

Structured data can also be stored as CLR UDTs but query and update support is weak (e.g. no specialized indexes, incremental on disk structures or query language). However, queriability may improve in a future version of SQL Server due to planned queriability improvements by the CLR and .Net language teams (i.e. LINQ). However, the ability to call methods on CLR UDTs in queries is a strong advantage.

When to use XML columns

One of the great things about working at Microsoft is that you can easily find people who have a deep understanding of the technology. Therefore, I asked the people who implemented the code what they thought were the relative merits of XML columns and relational structures. Based on their feedback augmented by practical experience I thing there are three cases where XML columns instead of standard relational structures make sense: end-user specified data, ordered data and strongly hierarchical data.

When you know that data will have structure but you do not know the structure of the data at schema design time then it makes sense to use an XML column to hold the data. For example, in a program database, programming language specific metadata will have a compiler specified structure that custom code will interpret to produce language specific reports. By using an XML column, language vendors can add structured data that is queriable with XQuery.

 

Relational structures have poor performance for inserting and deleting child nodes where child nodes are ordered within a parent node. When updates to a child’s position number is likely, an XML column is often a good choice since it simplifies data operations and has better performance when the number of children is large.

Hierarchical data where queries will usually navigate toward children (e.g. methods and their statements in a program database) is often best handled by XML columns. Because the child nodes are stored together with parent nodes, such queries can be a lot faster for XML columns than standard relational structures.

Use schema collections

There are two good reasons to use schema collections: data integrity and performance. Basing XML columns on schema collections allows SQL Server to do some basic type checking of XML data thus reducing the chance of inserting erroneous data. In SQL Server 2005, XML is stored as a binary stream. Basing an XML column on a schema collection causes XML data to be stored in a format that is very close to the binary record structure you might create by hand. For example, integers are stored as 4 byte values. In addition, schema collections allow XML indexes to be more efficient. The schema collection based binary format is more compact and more easily parsed than generic XML leading to better performance.

This posting is provided "AS IS" with no warranties, and confers no rights.