Flattening XML Data in SQL Server


If you’ve got some XML data in a SQL Server column, how can you flatten it out and query it or present it as though it was relational data? It turns out this is quite easy...


Setup


Let’s create a simple table to hold our data;


CREATE TABLE XmlSourceTable


(


      RecordId INT IDENTITY(1,1) NOT NULL PRIMARY KEY,


      XmlData XML NOT NULL


)


GO


And we’ll define some straightforward XML to import;


<?xml version="1.0" ?>


<Root>


      <Person>


            <Name>Simon</Name>


            <Age>20</Age>


            <Skills>


                  <Skill>Cooking</Skill>


                  <Skill>Cleaning</Skill>


            </Skills>


      </Person>


      <Person>


            <Name>Peter</Name>


            <Age>21</Age>


            <Skills>


                  <Skill>Ironing</Skill>


            </Skills>


      </Person>


</Root>


Ages may have been changed to protect the innocent J


Next, we’ll import it into my table using one of the mechanisms SQL Server provides – each XML file will be imported into a single row in the target table.


INSERT INTO XmlSourceTable(XmlData)


SELECT *


FROM OPENROWSET(


   BULK 'C:\XmlSource.xml', SINGLE_BLOB)


AS ImportSource


GO


After this, if we do a quick query...


SELECT * FROM XmlSourceTable


... we can see that we get a single row back containing an ID and some XML;


RecordId    XmlData


----------- -------


1           <Root><Person><Name>Simon</Name><Age>20</Age... (snip)


(1 row(s) affected)


Queries


The simplest way to extract this data is to use the CROSS APPLY keyword, as this executes a function against each row and then adds the returned data to the result set. Combining this with a method that can be called on the XML data type called nodes, we get some great results. A quick query like this;


SELECT


      pref.value('(Name/text())[1]', 'varchar(50)') as PersonName,


      pref.value('(Age/text())[1]', 'int') as PersonAge,


      pref.query('Skills') as PersonSkills


FROM  


      XmlSourceTable CROSS APPLY


      XmlData.nodes('/Root/Person') AS People(pref)


GO


... yields a completely different result set to our last query;


PersonName PersonAge PersonSkills


---------- --------- ----------------------


Simon      20        <Skills><Skill>Cooking</Ski... (snip)


Peter      21        <Skills><Skill>Ironing</Ski... (snip)


(2 row(s) affected)


We can see this query has flattened my single row of relational data with embedded hierarchical XML into two rows and columns of relational data. I’ve also included a subset of the XML as a column, just to show I can! Of course, if I wanted to I could modify this to get a list of people and their skills;


SELECT


      pref.value('(Name/text())[1]', 'varchar(50)') as PersonName,


      sref.value('(text())[1]', 'varchar(50)') as PersonSkill


FROM  


      XmlSourceTable CROSS APPLY


      XmlData.nodes('//Person') AS People(pref) CROSS APPLY


      pref.nodes('Skills/Skill') AS Skill(sref)     


GO


What is it Doing?


This SQL can be difficult to understand when you first look at it, but it isn’t really that complex. Breaking it down there are three key concepts that we’ll cover below – using my first query above that fetches PersonName, PersonAge, and PersonSkills as an example.


CROSS APPLY


The first concept is the use of CROSS APPLY, which many people haven’t used before. What this is doing is roughly equivalent to the following steps (note: this is my idiots guide to how I think about it, not a description of how the query optimiser does it);


1.       Fetch the rows from the XmlSourceTable table.


2.       For each row, call the “nodes” function on the XmlData column. This could be some other function too – it needn’t be on XML data. See the docs on APPLY for more info.


3.       Duplicate the XmlSourceTable row once for every row returned by the table valued function “nodes”.


4.       Add the columns returned by the “nodes” function to the columns in the result set.


5.       Continue doing filtering, joining, and column selection as for any other SQL query.


I hope that makes it a little clearer.


nodes() function


The XML data type in SQL Server defines a number of methods that can be called on it. One of these is “nodes” – and what this does is basically select a list of XML nodes that match an XQuery expression. Knowing this, look at the statement in my SQL;


XmlData.nodes('/Root/Person') AS People(pref)


This is using the path “/Root/Person” to ensure that all Person nodes that exist under the Root are selected. The result is aliased as a table named People, and each resulting XML node will be output as a separate row, in the “pref” column.


Plenty of alternative syntaxes are available for selecting this list of nodes, and this really is the core of how to flatten out the XML. I’ve also used “//Skill” syntax in my second query, for example, which selects every Skill node found in the XML it is used on.


Other XML functions


The last concept is the use of the selector XML functions – in my example I’ve used “value()” and “query()”. Both of these use XQuery expressions to select XML data.


Firstly, I’ve used the value() function to extract a specific value from the XML, and convert it to a SQL Server data type;


pref.value('(Name/text())[1]', 'varchar(50)') as PersonName


The “text()” function here retrieves the inner text from within the XML “Name” node. The “[1]” suffix acts as an indexer, and fetches the first result matched. I know there’s only one name node per person, but I need to make sure SQL Server knows this too. Finally, the second argument is the data type it should be converted to. For an Age, I’ve used ‘int’, but here we’re converting to a varchar type.


The “query()” function allows me to return an XML fragment;


pref.query('Skills') as PersonSkills


This returns the XML that matches the “Skills” node, and is found underneath the current Person element held in “pref” (i.e. the search is relative to the contents of pref). This means it returns fragments such as;


<Skills><Skill>Ironing</Skill></Skills>


Conclusion


None of this is all that difficult once you know how – so I hope this has given you a quick start to using XML data in SQL! Don’t forget to read up more generally on the XML data type and XML indexes.

Comments (33)

  1. Gaurav says:

    Excellent …this is what i was looking for…Thanx a lot …gr8 job

  2. Doug says:

    This could not have come to my attention at a better time!  PERFECT – THANK YOU!!!

  3. Simon J Ince says:

    @ Gaurav, Doug

    No problem, really pleased it is useful. Thanks for the feedback!

    Simon

  4. If you’ve got some XML data in a SQL Server column, how can you flatten it out and query it or present

  5. David Ackroyd says:

    Something is wrong here – the result set of the second query is:

    Simon Cooking

    Simon Cleaning

    Simon Ironing

    Peter Cooking

    Peter Cleaning

    Peter Ironing

    but the xml says Peter only does the Ironing and Simon only does the Cooking and Cleaning

  6. David Ackroyd says:

    To fix the second query, use this nested SQL:

    SELECT

     PersonName,

     sref.value(‘(text())[1]’, ‘varchar(50)’) as PersonSkill

    FROM

    (

    SELECT

     pref.value(‘(Name/text())[1]’, ‘varchar(50)’) as PersonName,

     pref.query(‘Skills’) as PersonSkills

    FROM  

     XmlSourceTable

     CROSS APPLY XmlData.nodes(‘//Person’) AS People(pref)

    ) a

       CROSS APPLY PersonSkills.nodes(‘//Skill’) AS Skill(sref)

  7. Simon J Ince says:

    @ David,

    Well spotted! You should win a prize!

    This was a classic simplification-for-a-blog-post mistake… and it comes down to the fact that "//Skill" matches *any* skill, which of course is not what we want. Instead, it should be as follows;

    SELECT

         pref.value(‘(Name/text())[1]’, ‘varchar(50)’) as PersonName,

         sref.value(‘(text())[1]’, ‘varchar(50)’) as PersonSkill

    FROM  

         XmlSourceTable CROSS APPLY

         XmlData.nodes(‘//Person’) AS People(pref) CROSS APPLY

         pref.nodes(‘Skills/Skill’) AS Skill(sref)      

    GO

    For comparison, the old version that doesn’t work is below (I’ve corrected my post):

    SELECT

         pref.value(‘(Name/text())[1]’, ‘varchar(50)’) as PersonName,

         sref.value(‘(text())[1]’, ‘varchar(50)’) as PersonSkill

    FROM  

         XmlSourceTable CROSS APPLY

         XmlData.nodes(‘//Person’) AS People(pref) CROSS APPLY

         pref.nodes(‘//Skill’) AS Skill(sref)      

    GO

    Hope that helps – let me know if fixes your issue.

    Simon

  8. Andrew says:

    So just to confirm if you want a Cross Apply to "Correctly" associate the child data then you should use the syntax of

    parent.node(‘ChildTag) c(child)

    child.node(‘NextChild’) nc(nextChild)

    etc…

  9. Simon J Ince says:

    @ Andrew,

    that looks about right to me – although note it is "nodes" not "node" (i.e. plural).

    Simon

  10. BG says:

    Thanks a lot. I saw light at the end of the tunnel after 2-3 days

  11. Dcip says:

    Simon,

    How would I extract

    <Finding Default="Same" RetrievalLimit="Hours" LimitToVisit="Yes" RetrieveLatest="Never" RetrievalNo="12">

  12. Simon J Ince says:

    @ Dcip,

    the above should get you started if you also realise that you can use the ‘value’ function to get attributes, for example;

    pref.value(‘./@PersonID’,’int’)

    … would get a PersonID attribute if my XML had entries like this;

    <Person PersonID="5"><Name>… etc

    Hope that helps.

    Simon

  13. Simonn says:

    How would I go about working with the below XML as I have tried and tried but cant seem to get it right

    <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/&quot; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot; xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><RefundCancelProduct xmlns="RMCS.Oasis.Integration.Web"><userCredential>CRMUser</userCredential><extendedRMCSID>40000000910-1</extendedRMCSID><refundAmount>19.98</refundAmount></RefundCancelProduct></soap:Body></soap:Envelope>

  14. boer86 says:

    How would i extrac the below?

    <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/&quot; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot; xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><UpdateCustomer xmlns="RMCS.Oasis.Integration.Web"><userCredentials>CRMUser</userCredentials><customerDetail><User>CRMUser</User><RMCSID>400001171815</RMCSID><Title>Mr</Title><Surname>MOREKI MISS L E</Surname><PromotionalOfferPhone>false</PromotionalOfferPhone><PromotionalOfferSMS>false</PromotionalOfferSMS><PromotionalOfferEmail>false</PromotionalOfferEmail><Language>English</Language><Alternative>0742552878</Alternative><Cell>0742874752</Cell></customerDetail></UpdateCustomer></soap:Body></soap:Envelope>

  15. Tommy says:

    This looks great but the datatype of my XML field has been set to "text" so none of this works…. Any suggestions on how this could be done?

  16. Simon J Ince says:

    @ Tommy,

    I think you're pretty much stuck unless you can convert the text to the XML data type, sorry!

    Simon

  17. Mike says:

    I get this error:

    Column or parameter #2: Cannot find data type XML

    Any suggestions?

  18. Simon J Ince says:

    @ Mike,

    are you using an older version of SQL Server? Apart from that, I stumped I'm afraid.

    Simon

  19. Josh says:

    Tommy, i am pretty certain you can CAST the column to XML.

  20. Haw says:

    Excellent article and sample. I searched a lot of the articles and samples about query xml data. Most of samples weren't work. But your's sample work very well and explianed clearly.

    Thank you so much.

  21. Simon J Ince says:

    @ Haw,

    thanks – and it's my pleasure!

    Simon

  22. Dilip says:

    Nice ! Good reading for Flattening XML in SQL

  23. Sivakumar says:

    This is excellent stuff .. which I understood real quick.. thanks again for the good work !!!!

  24. Tony says:

    You saved my day. Thank you so much for this.

  25. Jeff Moden says:

    Simon,

    Thanks for the nice, simple example of how to shred multip levels of "Element Based" XML.

    –Jeff Moden

  26. jo says:

    no example how to do where clause on xml field

  27. Dan Maslowski says:

    Not sure if anyone is still reading, but I am curious how you would sum child nodes, in this example how could you sum the ages of the two entries?

    Thanks!

  28. prem says:

    here i give a example for XML in SQL

    create xml schema collection cricketschemacollection

    AS N'<xsd:schema xmlns:xsd="http://www.w3.org/…/XMLschema"&gt;

    <xsd:element name="MatchDetails">

    <xsd:complexType>

    <xsd::complexContent>

    <xsd:restiriction base="xsd:anyType">

    <xsd:sequences>

    <xsd:element name="Team" minOccurs="0" maxOccurs="unbounded">

    <xsd:complexType>

    <xsd::complexContent>

    <xsd:restiriction base="xsd:anyType">

    <xsd:sequences/>

    <xsd:attribute name="country"type="xsd:string"/>

    <xsd:attribute name="score"type="xsd:string"/>

    </xsd:restiriction>

    </xsd::complexContent>

    </xsd:complexType>

    </xsd:element>

    </xsd:sequences>

    </xsd:restiriction>

    </xsd::complexContent>

    </xsd:complexType>

    </xsd:element>

    </xsd:schema>'

  29. LA Guy says:

    Better late than never.

    This very well written article got me started with extracting XML columns.

    Thanks, LA Guy 🙂

  30. Sagar says:

    Perfect!! Just what i was looking for.Well could you post some more complex examples on xml.I am trying to query xml of dtsx package to get information about the tables/SP/Queries used in it and as dtsx package is a little bit complex , an example would help me to achieve the result what i seek

    Also would like if you could post selecting nodes with particular conditions/value and XML in where clause

  31. Alex says:

    Can we avoid using CROSS APPLY and instead use JOIN

  32. Munesh Kumar says:

    Excellent Work. I have cleared my all the confusion .

    Thanks A lot.

Skip to main content