The ‘aggregate concatenation’ T-SQL (anti-)pattern


Some time back, Dimitri Furman who works as a Senior Consultant in the Microsoft Services team shared a very useful tip. It pertains to a pattern of usage wherein strings from multiple rows are concatenated into a single large (typically comma delimited) string. The usual way that we see this being accomplished is by code like the below:

DECLARE @res NVARCHAR(max)

SELECT @res = COALESCE(@res + ',', '') + FirstName + LastName
FROM AdventureWorks2016CTP3.Person.Person

SELECT @res

While the above is quite simple to read, it is quite slow (as you will see soon) and worse – it does not ‘inline’ meaning you cannot use the above inside a inline table-valued function. This last property was very important for us, because in the case we were looking at, the above COALESCE based concatenation code was in a scalar UDF, which clearly is a big ‘no-no’ from a performance perspective. Also, we did not want to utilize SQLCLR so the usage of the GROUP_CONCAT CLR UDA was ruled out.

Now, here is a much better way to refactor the above concatenation ‘loop’ into the below:

SELECT STUFF((
SELECT ',' + FirstName + LastName
FROM AdventureWorks2016CTP3.Person.Person
FOR XML PATH('')
,TYPE
).value('.', 'nvarchar(max)'), 1, 1, '') AS RawResult

On first glance, it may seem counter-intuitive that the second version could be faster, but that is indeed the case. For the above queries, here is the comparison:

COALESCE loop XML Path
CPU time = 7969 ms,  elapsed time = 7984 ms CPU time = 15 ms,  elapsed time = 16 ms

The usage of XML PATH in this way is not exactly ‘breaking news’ but the performance difference is dramatic. And more importantly for the case we were discussing at work, the XML PATH based code can be inlined, making it suitable for use in TVFs. And in our real case, that’s exactly what we wanted to do (refactor the UDF into a TVF.)

Have a great week ahead!

Comments (1)

  1. Martin Smith says:

    You also missed out the most important reason which is that Microsoft have stated many times the variable assignment approach is not guaranteed to work (collated some examples here http://stackoverflow.com/questions/15138593/nvarchar-concatenation-index-nvarcharmax-inexplicable-behavior/15163136#15163136)

Skip to main content