Data Integration - Part II - UDTs and OPENROWSET to parse custom file formats

SQLCLR is the Swiss Army knife with 1000s of foldable parts. You cannot beat the ability to plug right in and reuse good set of framework functions from the DBMS. If your DBA is one of those that say; “no way. I am not letting C# into my database”, this is what I’d tell him… I am sure one day someone insane enough will to try to write the entire framework using sql but until then we need SQLCLR. =)

Enough said: Here is a quick example of how to use UDTs as a way to parse custom file formats. I am not sure if this is the killer UDT demo but it certainly is a good example of SQL Server 2005 as a data integration platform.

Let’s say you have a folder full of files that have the following custom format and want to bring this data into your database. You have the ticker, the date and how many transactions are included in the file are all divided by ";". Right next to that part, you have and array of volume+price data delimited by “|”. This format and the data are totally arbitrary but you get the idea.

File1.txt - MSFT;10/10/04;3;10|5.001;20|6.05;30|7.33
File2.txt - ORCL;10/11/04;5;1|1.01;20|5.05;300|4.04;4000|3.03;50000|2.02
File3.txt - IBM;10/10/04;2;100|0.5;10|0.025

You create a user defined type - UDT that can parse this out.

CustomTradeDataStruct: This is the main struct for the whole file.

TradeArrayStruct: This is the struct that represents the variable length array for volume and price data.

[Serializable]

public struct TradeArrayStruct

{

public int volume;

public decimal price;

public TradeArrayStruct(string s){…}

public override string ToString(){…}

void IBinarySerialize.Read(System.IO.BinaryReader r){…}

void IBinarySerialize.Write(System.IO.BinaryWriter w){…}

};

[Serializable]

[SqlUserDefinedType(Format.UserDefined, MaxByteSize = 8000)]

public struct CustomTradeDataStruct : INullable, IBinarySerialize

{

string ticker;

string startdate;

int daterange;

TradeArrayStruct[] trades;

public CustomTradeDataStruct(string s){…}

public override string ToString(){…}

public bool IsNull {…}

public static CustomTradeDataStruct Null {…}

public static CustomTradeDataStruct Parse(SqlString s) {…}

void IBinarySerialize.Read(System.IO.BinaryReader r){…}

void IBinarySerialize.Write(System.IO.BinaryWriter w){…}

}

Ok here is the stuff we have been building up to:

SELECT cast(t1.c1 as CustomTradeDataStruct).ToString()

FROM OPENROWSET(BULK 'C:\File1.txt', SINGLE_NCLOB) AS t1(c1)

Now you have the data in SQL Server, parsed through a UDT. There is a great set of things you can do to improve the access to this data like creating a bunch of properties to allow access to individual fields in the file or creating a table valued function to manipulate the volume+price array like a table etc.

Cool or not so cool? Let me know what you guys think.