Transforming XML data into a rowset using the nodes() method

Usually there are a couple of scenarios in which you want to take XML data and convert it into a regular relational rowset.  In SQL 2000, you probably used OpenXML for this, which works pretty well if you are only working on a single piece of XML data, but as soon as you want to convert a rowset of XML instances into a rowset of relation columns things begin to break down pretty quickly.

In SQL Server 2005, the solution is to use the XML datatype's nodes() method.  The method works like a table valued function that accepts a single parameter, which is an XQuery statement which will result in a set of nodes.  We then return references (basically a pointer) from the nodes() TVF to each node which is selected by the XQuery statement.  You can then apply the XML datatype "scalar" functions such as exist(), value(), and query() to these references.

By way of an example, lets unpack some of this.  Lets say I have a piece of XML which represents some kind of contact list.  It has a bunch of people, their various phone numbers, and some metadata about the people (in this case, their age). 

What I want to do is query out the names and ages of each person.  I will use the nodes() method to get a reference to each <Person /> element, and then use value() method on each of these to pull out the properties that I am interested in.

create table contacts (pk int primary key identity(1,1), contactXml xml)

insert into contacts values ('
<People>
<Person age="35">
<Name>Pete</Name>
<Phone>
<Mobile>555-555-1234</Mobile>
<Home>555-555-0001</Home>
</Phone>
</Person>
<Person age="40">
<Name>Paul</Name>
<Phone>
<Mobile>555-555-4567</Mobile>
</Phone>
</Person>
<Person age="24">
<Name>Susan</Name>
<Phone>
<Home>555-555-2323</Home>
</Phone>
</Person>
</People>
')

select
pk,
person.ref.value('(Name/text())[1]', 'nvarchar(30)') as [name],
person.ref.value(<'@age'>, 'smallint') as [age]
from contacts c
cross apply c.contactXml.nodes('/People/Person') person(ref)

The cool thing about nodes() method is that it is also composable.  I can continue to drill into the data to get the phone numbers for each contact as well by applying the nodes() method again to the references which are returned by the first nodes() invocation.

select
pk,
person.ref.value('(Name/text())[1]', 'nvarchar(30)') as [name],
person.ref.value(<'@age'>, 'smallint') as [age],
phoneNumbers.ref.value('local-name(.)', 'nvarchar(30)') as [numberType],
phoneNumbers.ref.value('text()[1]', 'nvarchar(30)') as [number]
from contacts c
cross apply c.contactXml.nodes('/People/Person') person(ref)
cross apply person.ref.nodes('Phone/*') phoneNumbers(ref)

The result of this query is the following rowset:

pk name age numberType number
----------- ------------------------------ ------ ------------------------------ ------------------------------
1 Pete 35 Mobile 555-555-1234
1 Pete 35 Home 555-555-0001
1 Paul 40 Mobile 555-555-4567
1 Susan 24 Home 555-555-2323

The nodes() method also allows us to filter, and since we are doing a cross apply, this filter will reduce the total number of rows that we see output (as opposed to outer apply, which is more like an outer join).  So if we only want to see people that have Mobile numbers, then we can alter the XQuery statement in the second nodes() invocation from 'Phone/*' to 'Phone/Mobile'. 

In general, nodes() method is a great way to manipulate XML data both when it comes into the server for persisting as a relation rowset, or for converting your already stored XML data into a rowset for returning to the client.  Using XML Indexes are also a great way to increase nodes() performance since the reference that is returned via the nodes() method is part of the primary key (along with the primary key of the base table) of our clustered XML Index.  As a consequence of this, if you are using nodes() heavily in your application, you definitely want to look into utilizing an XML Index to increase query performance.