Be aware of comparing unicode constant with varchar column (updated)


    Today,  I looked for SQL Server JDBC Driver's  sendStringParametersAsUnicode, this parameter controls how Unicode String (Java only have Unicode string which is UTF-16 encoding) will be sending to SQL Server.   As the name of this parameter indicate,  when the parameter equals to true, which is the default setting,  we always send unicode data, i.e., the string with UTF-16 encoding, to SQL Server.  When it is set to false,  it also send ANSI data, i.e., the string encoded with a code page to SQL Server. In this case, it uses the code page of the current connected SQL Server database to convert  the Unicode String to a codepage encoded string.


     I am a Unicode Fan, and always want recommend people using Unicode. So when I knew this parameter,  my first question is that WOW, set the value to false has potential data corruption. When we convert the UTF-16 encoded string into a codepage encoded string, we may face the issue of some character is not defined in the code page, and will be replace as question mark (?).  Then, after I read this Article, I knew the reason of doing so is to enabling index seek when the target data type in SQL Server is varchar type.  Note, the Article ONLY apply when the collation of your column is a SQL Colaltion, but NOT Windows collation.  In most of case, people will use Windows collation as the collation, except us-English which still use sql_latin1_general_cp1_ci_as.  So If the target type is nvarchar type or your collation is Windows Collation,  don't set the parameter to false. 


     Come to conclusion,  for JDBC application,  please use nvarchar data type, and you will have full Unicode support, no data lose, no issue of not being able to use index, and better performance.  If you are using Windows Collation, don't set the flag to false.


 



 

Comments (0)

Skip to main content