Convert Comma Separated String to Table : 4 different approaches

Recently, I came across a piece of TSQL code that would take a comma separated string as an input and parse it to return a single column table from it. 

Lets’ call this function as Split1. The code is as follows:

 

CREATE FUNCTION [dbo].Split1(@input AS Varchar(4000) )

RETURNS

      @Result TABLE(Value BIGINT)

AS

BEGIN

      DECLARE @str VARCHAR(20)

      DECLARE @ind Int

      IF(@input is not null)

      BEGIN

            SET @ind = CharIndex(',',@input)

            WHILE @ind > 0

            BEGIN

                  SET @str = SUBSTRING(@input,1,@ind-1)

                  SET @input = SUBSTRING(@input,@ind+1,LEN(@input)-@ind)

                  INSERT INTO @Result values (@str)

                  SET @ind = CharIndex(',',@input)

            END

            SET @str = @input

            INSERT INTO @Result values (@str)

      END

      RETURN

END

This is a very old fashioned (but still effective enough) script which does a loop over a string to cut out all possible string values that are separated by a comma.

Let’s see now, how the same could be achieved in modern day TSQL languages (such as SQL 2005 or SQL 2008).

Approach 1: Common Table Expression (CTE)

 

Lets call this function as Split2.  here we are using

CREATE FUNCTION dbo.Split2 ( @strString varchar(4000))

RETURNS  @Result TABLE(Value BIGINT)

AS

begin

    WITH StrCTE(start, stop) AS

    (

      SELECT 1, CHARINDEX(',' , @strString )

      UNION ALL

      SELECT stop + 1, CHARINDEX(',' ,@strString , stop + 1)

      FROM StrCTE

      WHERE stop > 0

    )

   

    insert into @Result

    SELECT SUBSTRING(@strString , start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END) AS stringValue

    FROM StrCTE

   

    return

end   

 

GO

 

Approach 2: XML (surprise)

XML could be applied to do some type of string parsing (see this) Let’s call this function as Split3.

 

CREATE FUNCTION dbo.Split3 ( @strString varchar(4000))

RETURNS  @Result TABLE(Value BIGINT)

AS

BEGIN

 

      DECLARE @x XML

      SELECT @x = CAST('<A>'+ REPLACE(@strString,',','</A><A>')+ '</A>' AS XML)

     

      INSERT INTO @Result

      SELECT t.value('.', 'int') AS inVal

      FROM @x.nodes('/A') AS x(t)

 

    RETURN

END   

GO

 

Approach 4: Classic TSQL Way

I got this approach from SQL Server Central site. This approach is slightly unusual but very effective. this needs you to create a table of sequential numbers called a Tally Table.

 

SELECT TOP 11000 --equates to more than 30 years of dates

IDENTITY(INT,1,1) AS N

INTO dbo.Tally

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

 

Lets index the table for better performance.

 

ALTER TABLE dbo.Tally

ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

WITH FILLFACTOR = 100

 

Finally out Split4 function.

 

 

CREATE FUNCTION dbo.Split4 ( @strString varchar(4000))

RETURNS  @Result TABLE(Value BIGINT)

AS

BEGIN

 

  SET @strString = ','+@strString +','

     

      INSERT INTO @t (Value)

      SELECT SUBSTRING(@strString,N+1,CHARINDEX(',',@strString,N+1)-N-1)

      FROM dbo.Tally

      WHERE N < LEN(@strString)

      AND SUBSTRING(@strString,N,1) = ',' --Notice how we find the comma

 

      RETURN

END   

GO

 

Now, what about the most crucial question: Performance.  so lets put all 4 functions to test.

please note that I am running this on SQL Server 2008. you may need to modify it for SQL 2005.

 

DECLARE @str VARCHAR(4000) = '6,7,7,8,10,12,13,14,16,44,46,47,394,396,417,488,714,717,718,719,722,725,811,818,832,833,836,837,846,913,914,919,922,923,924,925,926,927,927,928,929,929,930,931,932,934,935,1029,1072,1187,1188,1192,1196,1197,1199,1199,1199,1199,1200,1201,1202,1203,1204,1205,1206,1207,1208,1209,1366,1367,1387,1388,1666,1759,1870,2042,2045,2163,2261,2374,2445,2550,2676,2879,2880,2881,2892,2893,2894'

SELECT * FROM dbo.split1 ( @str )

SELECT * FROM dbo.split2 ( @str )

SELECT * FROM dbo.split3 ( @str )

SELECT * FROM dbo.split3 ( @str )

On my laptop, I saw following numbers. I expected the XML to be the fastest.

image

 

so I changed the code slightly to test out XML function directly.

 

 

DECLARE @str VARCHAR(4000)

= '6,7,7,8,10,12,13,14,16,44,46,47,394,396,417,488,714,717,718,719,722,725,811,818,832,833,836,837,846,913,914,919,922,923,924,925,926,927,927,928,929,929,930,931,932,934,935,1029,1072,1187,1188,1192,1196,1197,1199,1199,1199,1199,1200,1201,1202,1203,1204,1205,1206,1207,1208,1209,1366,1367,1387,1388,1666,1759,1870,2042,2045,2163,2261,2374,2445,2550,2676,2879,2880,2881,2892,2893,2894'

Declare @x XML

select @x = cast('<A>'+ replace(@str,',','</A><A>')+ '</A>' as xml)

      

select t.value('.', 'int') as inVal

from @x.nodes('/A') as x(t)

image

 

 

Hmm. so just as I thought, XML is faster on its’ own.

there are some more options that I did not consider such as CLR functions. but we will get to those some other time.