SQL Collation and related performance impact, viewing collation in query plans

It has been a while since there has been activity on this blog, we as a team will be trying to post weekly going forwards so as to share what we do on a regular basis – happy reading!

I was posed a question by a fellow PFE during his performance `engagement. The specific question was whether a Literal predicate which had different accent sensitivity or collation when compared to a column would trigger an implicit conversion resulting in potential performance degradation and more importantly where in the plan would you see this.

Collation settings exist at the Server level, the Database Level and potentially defined at the column level as well. By default if no collation is specified at the column level when creating the table, database collation is assumed

To check the collation of a database:

Select DATABASEPROPERTYEX('TEMPDB','COLLATION')

clip_image001 

And further to see the collation of a column, we can use the query below

select object_name(object_id) as ObjectName,name As ColName,collation_name

from sys.columns where object_id = object_id('testcollate')

clip_image002 

 

Now moving on to more of the Performance aspects of the question:

 

a. Same Collation comparison – If the literal or columns being compared are the same collection, we have no problem as we can see below

set nocount on

use tempdb

go

drop table testcollate

go

create table testcollate( myid int identity, myname varchar(810))

go

insert into testcollate values(replicate('a',800))

go 10000

insert into testcollate values('Denzil')

go

create index myind on testcollate(myname)

go

set statistics io on

go

select myname from testcollate where myname = 'Denzil'

 

clip_image001

Table 'testcollate'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

b.  If the literal being compared has a different collation or is explicitly converted to a different collation, collation precedence kicks in- https://msdn.microsoft.com/en-US/library/ms179886(v=SQL.90).aspx

If the literal has an explicit collation, we will get a plan with a CONVERT/SCAN. The CONVERT_IMPLICIT by itself gives no indication this is due to a Collation difference per say, in fact almost looks like it could be some data type mismatch which it is not and on the constant side, there is a CONVERT given that we were explicitly collating it to a particular collation.

 select myname from testcollate

where myname = 'Denzil' collate SQL_Latin1_General_Cp437_CS_AS

 

clip_image002

 

Table 'testcollate'. Scan count 1, logical reads 1240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

You will have to look at the input/output trees to actually see where the change in collation is happening as that is not exposed in the query plan itself as far as I know. I am using QUERYTRACEON which is an undocumented command in order to demonstrate collation related converts. QueryTraceON is blogged about in several places – See Benjamin blog (Query Optimizer Trace Flags )

select myname from testcollate

where myname = 'Denzil' collate SQL_Latin1_General_Cp437_CS_AS

option (recompile,QueryTraceon 8606)

go

 ****************************************

*** Input Tree: ***

        LogOp_Project QCOL: [tempdb].[dbo].[testcollate].myid QCOL: [tempdb].[dbo].[testcollate].myname

            LogOp_Select

                LogOp_Get TBL: testcollate testcollate TableID=773577794 TableReferenceID=0 IsRow: COL: IsBaseRow1001

                ScaOp_Comp x_cmpEq

                    ScaOp_Convert varchar collate 520142856,Null,Var,Trim,ML=810

                        ScaOp_Identifier QCOL: [tempdb].[dbo].[testcollate].myname

                    ScaOp_Const TI(varchar collate 520142856,Var,Trim,ML=6) XVAR(varchar,Owned,Value=Len,Data = (6,Denzil))

            AncOp_PrjList

 

In order to get the Collation Name of that ID:

select convert(sysname, collationpropertyfromid(520142856, 'name'))

clip_image003 

 

c. If the Column has different collation than the database would we need to collate the literal to the column itself?  Is the constant or literal collated to the collation of the Database or the collation of the column given they both are different?

use tempdb

go

drop table testcollate

go

create table testcollate( myid int identity primary key, myname varchar(810) collate SQL_Latin1_General_Cp437_CS_AS)

go

insert into testcollate values(replicate('a',800))

go 10000

insert into testcollate values('Denzil')

go

create index myind on testcollate(myname)

go

--As you can see below, the Database collation is different than the Column collation

Select DATABASEPROPERTYEX('TEMPDB','COLLATION') as DBCollation, object_name(object_id) as ObjectName,name As ColName,collation_name as ColumnCollation

from sys.columns where object_id = object_id('testcollate')

 clip_image004

 

select * from testcollate where myname = 'Denzil'

 

We actually get a Seek here, which means the literal here was converted to the collation of the column and not the database.

clip_image005

 

select * from testcollate where myname = 'Denzil'

option (recompile,QueryTraceon 8606)

 ****************************************

*** Input Tree: ***

        LogOp_Project QCOL: [tempdb].[dbo].[testcollate].myid QCOL: [tempdb].[dbo].[testcollate].myname

            LogOp_Select

                LogOp_Get TBL: testcollate testcollate TableID=741577680 TableReferenceID=0 IsRow: COL: IsBaseRow1001

                ScaOp_Comp x_cmpEq

             ScaOp_Identifier QCOL: [tempdb].[dbo].[testcollate].myname

                    ScaOp_Const TI(varchar collate 520142856,Var,Trim,ML=6) XVAR(varchar,Owned,Value=Len,Data = (6,Denzil))

            AncOp_PrjList

 

 

 

Here you can see the Constant is being collated to the Column collation and not the database collation.

select convert(sysname, collationpropertyfromid(520142856, 'name'))

clip_image003 

 

There have been several blogs on Collation conflict and how to resolve that so I intentionally stayed away from that. Arvind also has a blog on some collation and performance scenarios which is a great read - SQL collation and performance

-Denzil Ribeiro, SQL Dedicated Premier Field Engineer