Getting those XML files into your brand spanking new relational database system

Up until now, XML documents have generally languished as mere files in the lowly file-system, our relational database systems haven’t seen them as important enough to warrant inclusion, up until SQL Server 2005 that is.  In SQL Server 2005, XML documents are now first class (data type) citizens; no more painful shredding required to hammer those triangular trees into oblong tables. 

 

With the new XML data type in SQL Server 2005, the rush is now on to get our XML documents out of the file-system and into the relational database, where they can, at last, feel validated (XML Schema validated that is!).

Let’s take a look at how we can get those XML documents sitting in the filesystem into SQL Server 2005.

A BULKed up OPENROWSET provider We have extended the OPENROWSET provider with the ‘BULK’ option; this allows the provider to read data directly from a file. We then added three optional arguments (SINGLE_CLOB, SINGLE_NCLOB, SINGLE_BLOB) for the BULK option, these arguments enable the contents of a file to be returned as a single-row, single-column row-set of, respectively, VARCHAR(max), NVARCHAR(max) or VARBINARY(max).

 

Example: The SINGLE_CLOB (non-Unicode) and SINGLE_NCLOB (Unicode) arguments enable us to read an XML file and insert it into an XML data type column.

 

[UPDATE 7/19: John Gallardo, rightly points out... "For importing XML data using OPENROWSET we recommend people use SINGLE_BLOB rather than SINGLE_CLOB or SINGLE_NCLOB.". I have updated the usage below. Thanks John!]

CREATE TABLE xml_documents( x XML );

INSERT INTO xml_documents ( x )

   SELECT * FROM OPENROWSET(BULK N'myXmlFile.xml', SINGLE_BLOB) AS x;

GO

A parameterization problem   That was easy, but there is a problem.  I have a lot of .xml files on my machine and they are all called different names.  Unfortunately OPENROWSET does not allow me to parameterize the filename parameter (I have still yet to find out why), i.e. this fails:

DECLARE @filename NVARCHAR(max);

SET @filename = N'myOtherXmlFile.xml';

INSERT INTO xml_documents ( x )

   SELECT * FROM OPENROWSET(BULK @filename, SINGLE_BLOB) AS x;

GO

Msg 102, Level 15, State 1, Line 5

Incorrect syntax near '@filename'.

So unless all my file names are the same, which of course they are not, I am left with building statements at runtime using dynamic SQL -- slow, horrible, and often vulnerable to a sql injection attack -- which we want to avoid.

DECLARE @filename NVARCHAR(max);

SET @filename = N'myOtherXmlFile.xml';

DECLARE @SQLString NVARCHAR(max);

SET @SQLString = N'INSERT INTO xml_documents ( x ) ' +

                              'SELECT * FROM OPENROWSET(BULK ''' +

                                    @filename + ''', SINGLE_BLOB) AS x';

EXEC sp_executesql @SQLString;

GO

SQL/CLR integration to the rescue Here is a great example of how a quick, small and neat use of the CLR integration in SQL Server 2005 can save you from a T/SQL coding mess.

I create a function in C#, let’s say LoadFile( SqlString filename ), that does allow me to parameterize the filename, and then I allow access to that function from T/SQL. Here is how:

Save this file as ‘LoadFile.cs’

using System.IO;

public partial class UserDefinedFunctions

{

[Microsoft.SqlServer.Server.SqlFunction]

public static byte[] LoadFile( string filename )

{

using (BinaryReader binaryReader = new BinaryReader((Stream)File.OpenRead(filename)))

{

byte[] bytes = new byte[binaryReader.BaseStream.Length];

binaryReader.Read(bytes, 0, bytes.Length);

return bytes;

}

}

};

[Update 7/21: This function passes the contents of the file back as binary bytes (original post was returning a string). We return binary bytes instead of reading as text, so we do not lose the XML document encoding]

 

Compile the C# file (just using the standard C# compiler sitting in your Windows\Microsoft.Net directory):

C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\csc /target:library /out:"c:\temp\SqlClrObjects.dll" "LoadFile.cs"

Import the assembly library into SQL Server and expose the C# function as a TSQL function.

/* First enable SQL CLR functionality (which is off by default) */

      sp_configure "clr enabled", 1

      GO

      RECONFIGURE

      GO

/* Create the assembly. Do a defensive drop of assembly and function (that maybe bound to the assembly) first */

      IF EXISTS (SELECT * FROM sys.objects WHERE name = N'LoadFile')

            DROP FUNCTION LoadFile;

      IF EXISTS (SELECT * FROM sys.assemblies WHERE name = N'SqlClrAssembly')

            DROP ASSEMBLY SqlClrAssembly;

      CREATE ASSEMBLY SqlClrAssembly

            FROM N'c:\temp\SqlClrObjects.dll'

            WITH PERMISSION_SET = EXTERNAL_ACCESS;

      GO

      // Note the PERMISSION_SET = EXTERNAL_ACCESS is required because the C# function we have written accesses the file-system

 

/* Expose the C# function in the Assembly as a T/SQL function */

      CREATE FUNCTION LoadFile ( @filename NVARCHAR(max) )

            RETURNS VARBINARY(max)

            AS EXTERNAL NAME SqlClrAssembly.UserDefinedFunctions.LoadFile;

      GO

/* Test the function */

      SELECT CONVERT( XML, dbo.LoadFile( 'c:\myOtherXmlFile.xml' ) ) AS xml

      GO

xml

-----------------------------------------

<please>Don't forget me</please>

(1 row(s) affected)

So Easy Now, we can use our new parameterized LoadFile function to insert an XML file (where the filename is specified as a variable) into the XML data type column

DECLARE @filename NVARCHAR(max);

SET @filename = N'myOtherXmlFile.xml';

INSERT INTO xml_documents ( x )

   SELECT dbo.LoadFile ( @filename );

GO

This is far neater, safer (and I bet faster) than the horrible dynamic SQL (sp_executesql) above.

If Only If I had a table containing the names of all the .XML files on my file-system, i.e. like this:

CREATE TABLE files ( filename NVARCHAR(max) );

INSERT INTO files VALUES( N'c:\myXmlFile.xml' );

INSERT INTO files VALUES( N'c:\myOtherXmlFile.xml' );

GO

I could write a single T/SQL statement to insert the contents of all my .xml files as rows in the xml_documents table

INSERT INTO xml_documents ( x )

      SELECT dbo.LoadFile ( filename ) FROM files;

GO

(2 row(s) affected)

SELECT * FROM xml_documents;

GO

<please>Put me in the RDBMS</please>

<please>Don't forget me</please>

But that is for another post.

In this post we have seen some of the enhancements to the OPENROWSET provider for loading files, we have seen a glimpse of the XML data type, and we have seen what may be the shortest SQL/CLR function you will ever see. If these help get you started, let me know. If I have left you with more questions than answers, then fire the questions my way and I will get back to you.

In my next post we will look at how we can create a SQL/CLR Table Valued Function (TVF) that returns a table containing a directory listing of all the .xml files on my machine.