SQL Server 2008 : new binary – hex string conversion functionality can dramatically improve related query performance by orders of magnitude.



In previous SQL Server releases it wasn’t possible to convert binary data to string characters in hex format directly, because SQL Server did not have a built-in Transact-SQL command for converting binary data to a hexadecimal string. The Transact-SQL CONVERT command converted binary data to character data in a one byte to one character fashion. SQL Server would take each byte of the source binary data, convert it to an integer value, and then uses that integer value as the ASCII value for the destination character data. This behavior applied to the binary, varbinary, and timestamp datatypes.


 


The only workarounds were to use either a stored procedure as described in a Knowledge Base Article:  “INFO: Converting Binary Data to Hexadecimal String” (  http://support.microsoft.com/kb/104829   ) or by writing a CLR function.


 


An ISV I work with doesn’t support CLR and therefore they implemented their own version of a custom convert function in form of a stored procedure. This one was even faster than everything else they found on the Internet.


 


NEW – IN SQL SERVER 2008 the convert function was extended to support binary data – hex string conversion. It looks like a tiny improvement almost not worth mentioning.


 


However, for the ISV it was a big step forward as some critical queries need this functionality. Besides the fact that they no longer have to ship and maintain their own stored procedure, a simple repro showed a tremendous performance improvement.


 


Repro:


=====


 


I transformed the procedure described in the KB article mentioned above into a simple function. The stored procedure below will create a simple test table with one varbinary column and insert some test rows in 10K packages ( e.g. nr_rows = 100 -> 1 million rows in the table ).


 


The repro shows two different test cases:


1. insert 0x0 two million times


2. insert 0x0123456789A12345 two million times


 


Depending on the length of the value the disadvantage of the stored procedure solution will be even bigger. On my test machine the results of the test queries below were:


(both tests were done with the same SQL Server 2008 instance – no change of any settings)


 


1. two million times value 0x0


 


    a, using stored procedure : about 3460 logical reads, no disk IO, ~52 secs elapsed time


    b, using new convert feature : about 5200 logical reads,  no disk IO, < 1 sec elapsed time


 


2. two million times value 0x0123456789A12345


    a, using stored procedure : about 3460 logical reads, no disk IO, ~157 secs elapsed time


    b, using new convert feature : about 5200 logical reads,  no disk IO, < 1 sec elapsed time


 


Repro Script:


========


 


create function sp_hexadecimal ( @binvalue varbinary(255) )


returns varchar(255)


as


begin


      declare @charvalue varchar(255)


      declare @i int


      declare @length int


      declare @hexstring char(16)


      select @charvalue = ‘0x’


      select @i = 1


      select @length = datalength(@binvalue)


      select @hexstring = ‘0123456789abcdef’


      while (@i <= @length)


      begin


            declare @tempint int


            declare @firstint int


            declare @secondint int


            select @tempint = convert(int, substring(@binvalue,@i,1))


            select @firstint = floor(@tempint/16)


            select @secondint = @tempint (@firstint*16)


            select @charvalue = @charvalue +


            substring(@hexstring, @firstint+1, 1) +


            substring(@hexstring, @secondint+1, 1)


            select @i = @i + 1


      end


return ( @charvalue )


end


 


 


create procedure cr_conv_test_table ( @value varbinary(16), @nr_rows int )


as


begin


      declare @exist int


      declare @counter int


      set NOCOUNT ON


      set statistics time off


      set statistics io off


      set statistics profile off


      set @exist = ( select count(*) from sys.objects


                      where name = ‘conv_test_table’ and


                            type = ‘U’ )


      if( @exist = 1 )


            drop table conv_test_table


 


      set @exist = ( select count(*) from sys.objects


                      where name = ‘conv_test_table_temp’ and


                            type = ‘U’ )


      if( @exist = 1 )


            drop table conv_test_table_temp


 


      create table conv_test_table ( varbincol varbinary(16) )


      create table conv_test_table_temp ( varbincol varbinary(16) )


      set @counter = 10000


      while @counter > 0


            begin


                  insert into conv_test_table_temp values ( @value )


                  set @counter = @counter 1


            end


      set @counter = @nr_rows


      while @counter > 0


      begin


            insert into conv_test_table select * from conv_test_table_temp


            set @counter = @counter 1


      end


end


 


— create 2 million test rows


execute cr_conv_test_table 0x0, 200


 


set statistics time on


set statistics io on


 


— compare runtime of stored procedure with new convert feature


select count(*) from conv_test_table


 where dbo.sp_hexadecimal(varbincol) = ‘0x00’


select count(*) from conv_test_table


 where CONVERT(varchar(255),varbincol,1) = ‘0x00’


 


— create 2 million test rows


execute cr_conv_test_table 0x0123456789A12345, 200


 


set statistics time on


set statistics io on


 


— compare runtime of stored procedure with new convert feature


select count(*) from conv_test_table


 where dbo.sp_hexadecimal(varbincol) = ‘0x0123456789A12345’


select count(*) from conv_test_table


 where CONVERT(varchar(255),varbincol,1) = ‘0x0123456789A12345’


 


 


 


Comments (1)

  1. jlcard says:

    I like that sql 2008 has support for converting binary into hex strings, that’s great !!!, however i still wish it would also support operations on varbinary columns like BITWISE AND/OR/XOR, do you know if that is in the plan for future?

    Also the last 2 queries in your post got me thinking a little bit.

    Looks like the "WHERE" clause in the query was first converting a varbinary field into a string then doing the comparison, and that conversion is ging to be done for every row in the table.

    I think it would be more performant to convert once the string to match into a varbinary variable, then use the varbinary variable to use in the select statement to compare direclty varbinary with varbinary, or is it not possible to do that with varbinary fields?