Which "where" is better in SQL 2005 and Xml, or the difference between knowing and knowing (posted by Paul)

One of the projects I’m working on is to take performance data generated from our build lab in xml and create nifty visualizations of the data so developers can see what impact the changes they are making have on their product's performance.

SQL Server 2000 had some pretty nice ways to query against xml, like OpenXml (sample code from BOL):

DECLARE @idoc int
DECLARE @doc varchar(1000)

EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

SELECT * FROM OPENXML (@idoc, '/ROOT/Customer/Order/OrderDetail',2)
WITH (OrderID int '../@OrderID',
CustomerID varchar(10) '../@CustomerID',
OrderDate datetime '../@OrderDate',
ProdID int '@ProductID',
Qty int '@Quantity')|

EXEC sp_xml_removedocument @idoc

This is pretty cool when you think about it, but is has some disadvantages. For one, there's not a nice way for me to pass along a subset of the data. If the xml document is big I'll either need to pass the entire document between stored procedures (not efficient) or flatten and reconsititute the xml text into every sproc I want to pass data to. Not fun.

With SQL Server 2005 there's a much faster way. Xml is now a datatype and as such can be passed around and queried against like any other datatype. My code can be simplified, improving performance and making it more flexible and easier to maintain. Here's one such example (no, this is not a real sproc but it shows the idea):

Create Procedure pr_ImportXml
(
@xmlAllData xml,
@Logon varchar(8)
)
AS
Begin
declare @xmlNode xml,
@buildID int,
@testCaseID int

   /* Send in a subset of data */
   select @xmlNode = @xmlAllData.query('/Run/Build')
   exec pr_InsertBuild @XmlData = @xmlNode, @NewKeyID = @buildID OUTPUT

   /* Send in a different subset of data */
   select @xmlNode = @xmlAllData.query('/Run/TestCase')
   exec pr_InsertBuild @XmlData = @xmlNode, @NewKeyID = @testCaseID OUTPUT

/* Here I'm ripping certain attributes and passing them to a sproc someone else owns, combining xml and non-xml data as they wanted only one input paramater.
*/

   set @xmlNode = (
         select @buildID as buildID,
                  @testcaseID as testCaseID,
r.value('@scenarioName', 'varchar(100)') as scenarioName,
/* presume there are other attributes here */
@Logon as ownerAlias
         from @xmlAllData.nodes('/Run/TestCase/Scenario') as Run(r)
         for xml auto, type
)
   exec pr_InsertScenario @xmlNode
End

This makes working with xml pretty easy doesn’t it. One of the other cool features is being able to select off of the same xml in the receiving sproc. We can even join to a normal SQL table and do a select like this:

   select c.CounterID
               bld.value('./@iteration', 'int') as iteration,
bld.value('./@result', 'decimal') as revalue,
bld.value('./@errorMessage', 'varchar(1000)') as errorMessage
from @ScenarioXml.nodes('/Scenario/Counter/Result') as b(bld)
inner join dbo.tbl_Counter c on bld.value('../@counterName','varchar(100)') = c.CounterName

Very nice. But like every developer I'm always looking for ways to make my code faster. A great doc for gems on improving Xml Performance in Yukon that I found was Performance Optimizations for the XML Data Type.

One tip they pointed out was to use the exist() method to check existance instead of the value() method, e.g.

   select c.CounterID
bld.value('./@iteration', 'int') as iteration,
bld.value('./@result', 'decimal') as revalue,
bld.value('./@errorMessage', 'varchar(1000)') as errorMessage
from @ScenarioXml.nodes('/Scenario/Counter/Result') as b(bld)
   where dbo.tbl_Counter c on 1 = bld.exist('../@counterName[. = sql:column(c.CounterName)]')

The reason for this is…ahhh wouldn't it be better if you read the whole document rather than just this tidbit? When you start working with xml datatypes I'd also recommend reading another whitepaper: XML Best Practices for Microsoft SQL Server 2005.

In the rush to turn projects out on tight deadlines, it can be very tempting to just get the code working and push it out the door, especially with new technologies where the tips and tricks aren't as well known, and *especially* if you work in a small, fast paced shop. How often do you get the time to revisit your projects and tune them post go-live unless a performance problem rears it's head (usually after several months as more and more data is accumulated into the system)?

But developing good peformance habits early when working with a new technology pays handsome dividends down the line, and it's worth it to take that extra time to scour MSDN for these excellent SQL 2005 resources.

Paul