Using XML Source


Providing a generic way to extract data from XML documents of any format is pretty complicated. The SSIS XML Source Adapter tries to do just that, to various degrees of success.

XML Source is easiest to use when your input has a simple element/sub-element structure:

<rootgoo>
    <goo>
        <subgoo>value</subgoo>
        <moregoo>1</moregoo>
    </goo>
    <goo>
        <subgoo>value</subgoo>
        <moregoo>2</moregoo>
    </goo>
</rootgoo>

Or when the values are listed as attributes (such as the output you get from SELECT … FOR XML RAW statements in SQL Server).

<root>
   <row CustomerID="1" TerritoryID="1" AccountNumber="AW00000001" />
   <row CustomerID="2" TerritoryID="1" AccountNumber="AW00000002" />
</root>

Both of these XML documents would produce a single output in the XML Source. When your XML data starts to get more complicated (multiple levels of elements / attributes), you start getting more outputs. These outputs are linked with IDs, which you will have to join yourself with a merge join transform.

Dealing with multiple outputs

In this completely made up example. the XML is a little more complex.

<extract date="2007-12-05">
    <counters>
        <counter category="dispatcher" name="server1">
            <runtime>6</runtime>
            <queue>3</queue>
            <maxrequest>8</maxrequest>
            <color>blue</color>
            <host>
                <name>svo2555</name>
                <path>\\dispatcher</path>
                <lastaccessed>2007-02-03</lastaccessed>
            </host>
        </counter>
        <counter category="gateway" name="server1">
            <runtime>1</runtime>
            <queue>10</queue>
            <maxrequest>10</maxrequest>
            <color>purple</color>
            <host>
                <name>svo2555</name>
                <path>\\gateway</path>
                <lastaccessed>2007-02-03</lastaccessed>
            </host>
        </counter>
    </counters>
</extract>

Let’s see how XML Source treats this.

xmlsrc

As you see, it created three outputs – counters, counter, and host – and created _id columns to identify the rows. Our destination table wants all of this data in a single row, which means we’ll have to merge.

image

First we’ll mark that the data has been sorted already, using the generated _id columns as the Sort Key. This will let us merge without having to insert a Sort transform.

– Right click on the XML Source, and bring up the Advanced Editor

– Select the host output, and set the IsSorted property to True.

image

– Expand the host output, and then expand Output Columns. Select the _id field, and set the SortKeyPosition property to 1

image

Follow the same steps for the counter output. Make sure you set the SortKeyPosition value on the "counter_id" column, and not the "counters_id" column – we’re going to ignore the counters output all together.

Add a Merge Join transform. Connect the host and counter outputs from the XML Source to it.

image

You can then open the merge join transform, and take the columns from both inputs.

image

Voila!

Comments (22)

  1. SSIS Stuff says:

    As I wrote in my previous post , complex XML documents will produce multiple outputs when you’re using

  2. SenthilS says:

    How it can done using vb macro? can u explain

  3. Nice. I’ve never read this before. Shows how valuable a bit of out-of-the-box thinking can be.

    -Jamie

  4. kalcee says:

    hello there,

    Just as your example had 3 outputs, one of the XML I am working on produced 5 outputs. So I tried using multiple "merge and joins". The outputs were merged fine, but when I try to give the outputs of the resultant merge and joins to the next one, en error shows up saying that

    "The IsSorted property must be set to true on both sources of this transformation"

    How do I set that on a merge and join transform ??

  5. Hi Kalcee,

    Looks like you might have missed a step. Take a look at the step that starts with "Right click on the XML Source, and bring up the Advanced Editor" … you can force the IsSorted property to true on the "Input and Output Properties" page.

    ~Matt

  6. Telly says:

    Hi,

    Great information, I can remove most of my manual sorts!  It will clear my task up a lot!

    Quick question though, in your XML you have your main tag with a ‘date’ attribute, taking a look at your list of record sets you don’t have a extract set.  I’ve run into the same problem as the parent tag has values in which I need.  Do you know how to access this data?

    In my case it’s for the report date, I need to know for which date the data in the XML is for, unfortunately the date is held within the main tag like in your example.

    <extract date="2007-12-05">

       <counters>

    The date is not accessible according to my SSIS

    Cheers

    Ian

  7. Hi Ian,

    I noticed this too – it looks like our XML Source doesn’t pick up attribute values from the root element.

    Your best bet is to massage the data bit with XSLT before processing. See my other post about this here:

    http://blogs.msdn.com/mattm/archive/2007/12/15/xml-source-making-things-easier-with-xslt.aspx

    ~Matt

  8. KeelioSoftware says:

    As a side note to using XML as a source, there is an XML destination adapter for SSIS available at http://www.keelio.com

  9. Gully Coder says:

    XML source and multiple data outputs for elements.

  10. Arild - IMMIX says:

    Excellent! Looked all over to find a good and simple explanation of how to get the XML files into SQL Server

  11. brenotte@hotmail.com says:

    Hello,

    I have a single XML file and I use your technique (great post by the way..).

    But my problem is that I have more that two outputs coming from my XML Source Adapter.

    How could I deal with this without having to "cascade" the merge join ?

    Any help is welcome.

    Thanks.

  12. rperetz says:

    what if I don’t want to merge all the data in one table, what if I want to have one to many tables with a FK key?

    I did get the keys generated from SSIS but they don’t seem work give right relationship.

    for example I got a table called events which hold the callid and eventid and I was hoping it would have the relationship there, but nope.

    the ids don’t match in both tables, in fact I can’t even find the eventid that is linked to the callid…

    at this point I got my data mapped to my SQL tables fine, but with out a relationship. I can’t tell how many events are for one call.

    I am about to re do everything using C# and xml objects to import the data. it’s hard to believe that this could not be done in SSIS.

    please help.

    Here is the example of the xml I am working with

    – <calls>

    – <call id="@BR8" callkey="1" date="09062008" acd="0" starttime="2118" endtime="2181" type="1" result="1" from="" to="x82185 CUSTOMS" dnis="2623" ansby="x82185 CUSTOMS" requeues="0" intf="0" overflowed="0" transfd="0" held="0" confd="0" timetoabn="0" timetoans="6" tottalktm="57" totconftm="0" totholdtm="0">

    – <events>

    – <event starttime="2118" callkey="1" endtime="2118" type="1" ani="" dnis="" digits="82185" cause="0" montd="0" rqud="0" intf="0" overflowed="0">

     <trnk key="default" id="C04B20" name="" nick="" />

     </event>

    – <event starttime="2118" callkey="1" endtime="2124" type="2" ani="" dnis="2623" digits="82185" cause="22" montd="0" rqud="0" intf="0" overflowed="0">

     <ext key="slap" id="82185" name="CUSTOMS" nick="" vport="0" autoattend="0" qcc="0" />

     <ext key="default" id="82185" name="CUSTOMS" nick="" vport="0" autoattend="0" qcc="0" />

     <trnk key="from" id="C04B20" name="" nick="" />

     </event>

    – <event starttime="2124" callkey="1" endtime="2181" type="3" ani="" dnis="2623" digits="" cause="0" montd="0" rqud="0" intf="0" overflowed="0">

     <ext key="slap" id="82185" name="CUSTOMS" nick="" vport="0" autoattend="0" qcc="0" />

     <ext key="default" id="82185" name="CUSTOMS" nick="" vport="0" autoattend="0" qcc="0" />

     </event>

     <event starttime="2181" callkey="1" endtime="2182" type="22" ani="" dnis="" digits="" cause="0" montd="0" rqud="0" intf="0" overflowed="0" />

     </events>

     </call>

  13. troy.g.a@gmail.com says:

    Ive got the same problem with reading the date attibute from the root element.

    Its a standard, and i have a lot of different sources. Is there an update to fix this to get a 2nd output? or another work around?

    I tried adding an output manually but get this Data Flow Task [XML Source [1]]: An output cannot be added to the outputs collection.

  14. Hi Diwakar – if your XML documents have different schemas (i.e. different columns and/or data types), you will need to use a separate data flow task for each schema.

  15. Hi Matt,

    I am doing impact analysis of ssis package. here i am taking xml file of various .dtsx packages. its structure is changing as per the C# script used or not, SQL task is used or not.

    I want all the xml of different .dtsx should be in a common format so that i can use a common source format and fetch all the xml files one by one using for each loop container.

    Is it possible.

  16. Hi Matt

    Can you provide me a C# script that can read a xml file has a particular node or not so that i can call ssis package acording to that

  17. I am using the following script

    public void Main()

               {

                   XmlDocument oDoc = new XmlDocument();

                   oDoc.Load("C:\xml_with_transf_sql_cSharp_AfterXslt.xml");

                   XmlNode oNode = oDoc.SelectSingleNode("//test");

                   XmlAttributeCollection oAttribCol = oNode.Attributes;

                   foreach (XmlAttribute oAttrib in oAttribCol)

                   {

                       if (oAttrib.LocalName == "ProjectItem")

                       {

                           //Instansiate an Package

                           Microsoft.SqlServer.Dts.Runtime.Application application = new Microsoft.SqlServer.Dts.Runtime.Application();

                           Console.WriteLine("load package");

                           // creating a package container to hold package

                           //And load the package using application object

                           Package package = application.LoadPackage(@"C:test_projfind_table_sql_cSharpPackage.dtsx", null);

                           Console.WriteLine("Execution of package started");

                           DTSExecResult result = package.Execute();

                           Console.WriteLine("package execution result: {0}", result.ToString());

                           //wait to exit

                           Console.Read();

                       }

                      /* else

                       {

                           Microsoft.SqlServer.Dts.Runtime.Application application1 = new Microsoft.SqlServer.Dts.Runtime.Application();

                           Console.WriteLine("load package");

                           // creating a package container to hold package

                           //And load the package using application object

                           Package package = application1.LoadPackage(@"C:test_proj_simplePackagefind_table_sql_cSharpPackage.dtsx", null);

                           Console.WriteLine("Execution of package started");

                           DTSExecResult result1 = package.Execute();

                           Console.WriteLine("package execution result: {0}", result1.ToString());

                           //wait to exit

                           Console.Read();

                       }*/

                   }

               /* //Instansiate an Package

                  Microsoft.SqlServer.Dts.Runtime.Application application = new Microsoft.SqlServer.Dts.Runtime.Application();

                  Console.WriteLine("load package");

               // creating a package container to hold package

               //And load the package using application object

                  Package package = application.LoadPackage(@"C:test_projfind_table_sql_cSharpPackage.dtsx", null);

                  Console.WriteLine("Execution of package started");

                  DTSExecResult result = package.Execute();

                  Console.WriteLine("package execution result: {0}", result.ToString());

                  //wait to exit

                  Console.Read(); */

               // TODO: Add your code here

               Dts.TaskResult = (int)ScriptResults.Success;

           }

       }

    }

    And it is giving the following error

    Error: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> System.NullReferenceException: Object reference not set to an instance of an object.

      at ST_8d6e34fe164e41a2b32e91011f84a693.csproj.ScriptMain.Main()

      — End of inner exception stack trace —

      at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)

      at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)

      at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)

      at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

      at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)

      at System.Type.InvokeMember(String name, BindingFlags invokeAttr, Binder binder, Object target, Object[] args, CultureInfo culture)

      at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()

  18. Phil Morris says:

    No Root element in the schemas. Why? No really – WHY? I understand there is a 'rationale' behind the decision but that doesn't reflect what is possible in XML, a very well defined standard in which root elements can contain attributes, or simple elements.

    So, can the 'rationale' be re-examined, and perhaps SSIS users can get at their root node data WITHOUT having to resort to long complicated workarounds?

    Just a suggestion, but as a user of 4 other ETL tools that all will allow this out of the box, you'd think SSIS would do this.

  19. David Boyd says:

    How are the counter ids produced, as they are not produced for us?  Is this because we have a supplied xsd file?  In that case how can we do the merge?

  20. The _Id column will be there automatically for all outputs. If you're not seeing it, make sure that it is not deselected on the "Columns" page.

  21. JThaas says:

    Hi does anyone has a simple example of how to incorporate a xpath query in SSIS Vb script component

  22. Sachin Chauhan says:

    Hi Matt,

    I tried the same and I am able to get simplified xml using xml task and xslt file that I created. But my problem is that if I am dropping a data flow task to control flow and then xml source in xml flow then I am unable to get the data into sql table. I am getting error message

    "the component has detected potential metadata corruption during validation Error at data flow task…..". It seems due to use of generated xml file two times. Can you let me know how to sort it out?

    Thanks,

    Sachin