Efficient Property Promotion from XML Instances

Article
05/24/2005

In many cases, when storing XML data, some of the elements/attributes will be more important for storage/querying than others. Access to values inside XML data-type instances requires additional overhead when compared with just selecting relational data from a table. Thus, there are many times when we would like to take some of the data out of an instance and promote this into relational tables – this is often referred to as ‘property promotion’. SQL Server 2005 allows the user to define views and computed columns over XML data-type instance. These views/computed columns are useful for simplifying queries, but they will not provide any additional performance unless their contents are materialized somehow. SQL Server 2005 allows us to create an index over a view which will materialize the values stored and for computed columns, we can mark them as ‘persisted’ in order to have their values materialized in the database. In the case of an index view or a persisted computed column, queries of this data will simply retrieve the materialized values instead of re-executing the view/column definition, thus providing a performance benefit. Unfortunately, due to the complex nature of the XML data-type inside SQL Server 2005, there are several restrictions and caveats when it comes to using XML data-type methods inside view and/or computed column definitions. For this posting, I will walk the user through the steps required to create a materialized view over a set of XML data-type instances. For details on computed columns, Michael Rys has an article on his blog (Using XQuery Methods in CHECK Constraints) which deals specifically with check constraints on XML data-type columns, but the content is also applicable to computed columns.

Firstly, we will create a database with some stored XML data, typed according to the following schema:

 create xml schema collection EmployeeDataXSD as N'
<xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" xmlns:company="https://www.example.com/mycompany" 
    targetNamespace="https://www.example.com/mycompany" elementFormDefault="qualified">

   <xs:element name="employee">
      <xs:complexType>
          <xs:all>
              <xs:element name="name" type="xs:string" />
           </xs:all>
         <xs:attribute name="id" type="xs:ID" />
       </xs:complexType>
 </xs:element>
</xs:schema>'

This is a simple schema, concocted to store some arbitrary data about employees at a company. Adding this XSD to an XML schema collection called ‘EmployeeDataXSD’, we then create the following table:

 create table EmployeeData( id int primary key identity, xml_data xml( EmployeeDataXSD ))

The following pieces of XML are then inserted:

 insert into EmployeeData values( '<employee id="EMP1" xmlns="https://www.example.com/mycompany"><name>Joe</name></employee>')
insert into EmployeeData values( '<employee id="EMP2" xmlns="https://www.example.com/mycompany"><name>Fred</name></employee>')
insert into EmployeeData values( '<employee id="EMP3" xmlns="https://www.example.com/mycompany"><name>Tom</name></employee>')

Within our hypothetical company, the important information for querying over our employee’s is their name and employee identification number. Instead of forcing us to use an XML data-type method to extract this data each time we need it, we can instead define a view over this XML data that automatically extracts this. Most people’s initial reaction will be to define a view that uses the XML data-type methods directly, like the following:

 create view EmployeeDataProperties
with schemabinding
as
      with xmlnamespaces( default 'https://www.example.com/mycompany' )
      select 
            id as [Database ID], 
            xml_data.value( '/employee[1]/name[1]', 'nvarchar(100)' ) as [Employee ID],
            xml_data.value( '/employee[1]/@id', 'nvarchar(10)' ) as [Name]
      from dbo.EmployeeData

In fact, this works fine and we are able to query this view to find the names and employee ID’s. All well and good, until we try and materialize this view by creating the following index:

 create unique clustered index EmployeeDataProperties_IDX on EmployeeDataProperties( [Database ID], [Employee ID], [Name] )

Attempting to create this index results in the following error:

 Msg 1985, Level 16, State 1, Line 1
Cannot create index on view 'dbo.EmployeeDataProperties'. It contains one or more xml data type methods.

At first glance this appears to be a dead end solution as you are not allowed to have any XML data-type methods inside a view definition! But, we can get around this by encapsulating the XML data-type method calls into UDF functions and using these as part of the view definition. To this end, I define the following two functions:

 --- UDF to extract an employee name from an XML fragment
create function GetNamePropertyUDF( @input xml )
      returns nvarchar(100)
with schemabinding
as
begin
      return @input.value( 'declare default element namespace "https://www.example.com/mycompany"; /employee[1]/name[1]', 'nvarchar(100)' )
end
go
 
--- UDF to extract an employee ID from an XML fragment
create function GetIDPropertyUDF( @input xml )
      returns nvarchar( 10 )
with schemabinding
as
begin
      return @input.value( 'declare default element namespace "https://www.example.com/mycompany"; /employee[1]/@id', 'nvarchar(10)' )
end
go

Notice that the input XML is provided as an argument to the function, rather than selecting directly from the base table. This is to avoid another caveat of views which does not allow you to use UDF’s that perform data access. Also notice that we pass untyped XML into these functions. There is nothing to stop us from associating an XML schema collection with the input parameter, but this will add needless overhead since the original storage already performed the required validation. Now that we have our two UDF functions, we redefine our view in terms of these:

 create view EmployeeDataProperties
with schemabinding
as
      select 
            id as [Database ID], 
            dbo.GetIDPropertyUDF( xml_data ) as [Employee ID],
            dbo.GetNamePropertyUDF( xml_data ) as [Name]
      from dbo.EmployeeData

This time, we are able to successfully materialize the view by adding the index specified above. Querying this view provides us with the following list of employee names and ID’s:

 Database ID Employee ID Name
1       EMP1        Joe
2        EMP2        Fred
3       EMP3        Tom

To demonstrate that the indexed view is actually being kept up to date with the data in the underlying XML data-type instance, we can run the following XML DML statement to update the value of the employee ID for the employee named ‘Joe’:

 with xmlnamespaces( default 'https://www.example.com/mycompany' )
update EmployeeData set xml_data.modify( 'replace value of /employee[1]/@id with xs:ID("EMP4")' )
where xml_data.exist( '/employee[name="Joe"]' ) = 1

If we now query the data from the materialized view again, we get the following:

 Database ID    Employee ID Name
1       EMP4        Joe
2        EMP2        Fred
3       EMP3        Tom

-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm.

Efficient Property Promotion from XML Instances

-Disclaimer:This posting is provided “AS IS” with no waranties, and confers no rights.Use of included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm.

Additional resources

-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm.