SQL Server and UTF-8 Encoding (1) -True or False


Today, I will start my series of articles about SQL Server and Unicode UTF-8 Encoding. In many times, I found when people ask me about UTF-8, they actually don’t understand UTF-8.  So today’s talk will be quite short. I just clarify some misunderstand.

1.    SQL Server doesn’t support Unicode, do you mean UTF-8?

    Sometime, people just say “SQL Server doesn’t support Unicode”. Actually, it is wrong, SQL Server support Unicode since SQL Server 7.0 by providing nchar/nvarchar/ntext data type.  But SQL Server doesn’t support UTF-8 encoding for Unicode data, it supports the UTF-16 encoding.  I copy several definitions from Internet for these concepts:

   “Unicode is the universal character encoding, maintained by the Unicode Consortium. Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.  The unique number, which is called the code point of a character, is in the range 0 (hex) to 10FFFF(hex). It is normal to reference a Unicode code point by writing “U+” followed by its hexadecimal number. For example, Character A is defined as “Latin Uppercase Alphabet”, and assigned a code point U+0041.  
      In additional to assign each character to a unique code point, Unicode Consortium also defined several Unicode transformation formats (UTFs) which are the algorithmic mapping from every Unicode code point to a unique byte sequence. Note, the Unicode code point itself has nothing related to how to store/transform in a Computer, but a UTF does. 
      The commonly used UTFs are UTF-16 and UTF-8. UTF-8 is the byte-oriented encoding form of Unicode which commonly used on Unix Operating System, Web Html and Xml File.  UTF-16 uses a single 16-bitcode unit to encode the most common 63K characters, and a pair of 16-bit code unites, called surrogates, to encode the 1M less commonly used characters in Unicode. UTF-16 is commonly used on Windows, .Net and Java. The transform between different UTFs are loseless and round-tripping. “

    In summary, don’t confuse with Unicode and UTF-8 Encoding. They are totally different concepts. 

2.    UTF-8 Encoding is much better than UTF-16 Encoding

There are tons of articles comparing with UTF-8 encoding with UTF-16 encoding.  I will compare these two encoding side by side in my next article.  I can have 100 reasons to say UTF-8 Encoding is better than UTF-16 Encoding, and verse vice.  The correct answer is that no encoding is absolute better than the others.  User should choose the suitable encoding according to your application software requirement.  The operation system, programming language, database platform do matter when choosing the encoding.  UTF-8 is most common on the web. UTF-16 is used by Java and Windows. The conversions between all of them are algorithmically based, fast and lossless. This makes it easy to support data input or output in multiple formats, while using a particular UTF for internal storage or processing.


    So please don’t jeopardize SQL Server’s Unicode support because of it only support one of the UTFs.

3.    SQL Server cannot store all Unicode Characters

You may notice that I say SQL Server support UTF-16 Encoding in previous paragraph, but I also said SQL Server’ nvarchar type encoding is UCS-2.  I intend to do this in this article because I want to discuss the different between these two Encodings in here. Let us describe in details in what area SQL Server support UTF-16 encoding:



  • SQL Server can store any Unicode characters in UTF-16 encoding. The reason is that the storage format for UCS-2 and UTF-16 are the same.

  • SQL Server can display any Unicode characters in UTF-16 encoding. The reason is that we internally call Windows functions to display characters, the Windows functions and fonts can take care of the supplementary character (a character take 4 bytes in UTF-16) correctly.

  • SQL Server can input any Unicode characters in UTF-16 encoding. The reason is that we internally call Windows IMEs (Input Method Editors) to input, the Windows IMEs can take care of the supplementary character (a character take 4 bytes in UTF-16) correctly.

  • SQL Server can sort/compare any defined Unicode characters in UTF-16 encoding.  Note, not all code points are map to valid Unicode character. For example, The Unicode Standard, Version 5.1 defines code points for around 10,000 characters.  All these characters can be compared/sorted in SQL Server latest version: SQL Server 2008.

In contrast, I also list the UTF-16 thing SQL Server doesn’t support:



  • SQL Server cannot detect Invalid UTF-16 sequence. Unpaired surrogate character is not valid in UTF-16 encoding, but SQL Server accept it as valid. Note, in reality, it is unlikely end-user will input invalid UTF-16 sequence since they are not support in any language or by any IMEs.

  • SQL Server treats a UTF-16 supplementary character as two characters. The Len function return 2 instead of 1 for such input.

  • SQL Server has potential risk of breaking a UTF-16 supplementary character into un-pair surrogate character, such as calling substring function. Note, in the real scenario, the chance of this can happen is much lower, because 1)  supplementary character is rare 2) string function will only break this when it happens be the boundary.  For example, when calling substring(s,5,1) will break if and only if the character at index 5 is a supplementary character.

In summary, SQL Server DOES support storing all Unicode characters; although it has its own limitation.  Please refer to my previous blogs to details.

Comments (74)

  1. Ravi says:

    Hi,

    How to set the Collation to UTF-16 in SQL Server 2000 and SQL Server 2005. could you please give the steps to change the collation.

    What is the name of the Collation for UTF-16 in SQL Server 2000/2005.

    Thanks

  2. yaoqs@hotmail.com says:

    Hello, Ravi

     When you choose nvarchar as your data type,  you automatically get UTF-16 Encoding.  Please note,  For nvarchar types, we always use UTF-16 encoding to store the data no matter which collation you use.  The collation is only matter on what method it will use to compare two string.  Since different language has different sorting rule, that is the reason we have many different collations.

     The most common collation is latin1_general_ci_as. Although the name seems indicate it can only sort latin based character correctly, but it actually can sort many or majority of the language correctly.

  3. SharePoint Uploading Files to SharePoint Server 2007 from ASP.NET Web Applications by Using the HTTP

  4. William says:

    Hi Qingsong, I’m dealing with chinese characters. How come when I imported mt utf-8 file with chinese chars in sql server I get shattered non chinese characters even though I use nvarchar? Can you kindly advise? Thank you

  5. Qingsong Yao says:

    Hello, William

     If you are using SQL Server 2005, you can use bcp.exe to bulk load UTF-8 data into your table,  try

    [-C code page specifier] option and use 65001 as code page specifier.  Note,  I never tried this option, but I know it works.

     If you are using SQL Server 2008, the bcp.exe shipped with SQL Server 2008 don’t support import UTF-8 data in.  As a workaround, you can install the SQLNCLI library shipped in SQL Server 2005 (I beleive it is a downable package) on another machine, and bulk load UTF-8 data into SQL Server 2008 table.

     Another workaround is converting your data from UTF-8 to UTF-16 encoding, you can either write a simple C# program or use NotePad.

     Note: the default chinese character’s encoding is GB2313, which is supported as code page 936. which is natively supported.

  6. CJK says:

    So, what you’re saying is that SQL Server doesn’t support UTF-16 in all cases, but does support UCS-2 ? (equivalent to UTF-16 BMP only). Unfortunately that will mean that not all Chinese, Japanese and Korean characters can be stored in SQL Server. A partial implementation is not much use if you are doing global internationalization.

    Thanks for taking the time to write the article.

  7. Qingsong Yao says:

    Hello, CJK

     All Chinese, Japanese and Korean characters can be stored in SQL Server.  Do you have a case that a character failed to store or retrieve from SQL Server?

  8. Mary Ellen says:

    Great Article!! Can you please tell me where the next article is, that you referred to, where you describe the difference between UTF-16 and UTF-8? Thanks for writing the article.

  9. James Brook says:

    Hi,

    I have to create a database (SQL2005) for multinational usage (English and Korean languages included).

    This database will be used by an international application.

    Here is my question :

    when creating a new database I’m not sure what collation option to pick to support the Korean language.

    Thanks in advance for your help

  10. Qingsong Yao says:

    HEllo, James

     The short answer is that a collation can support many languages.  A collation has a major language, such as latin1_general_ci_as has major language us_English.  It  also support Korean language since Korean language is compatiable with us_English.  So this collation is enough for your application.  

     However, the latin1_general_ci_as only support Unicode 2.0 characters, so  if you want more advacned Korean Language support, and Korean is the majority language of your application, you should choose Korean_90_CI_As as your collation.

      Please let me know if you have further question about this.

    Qingsong

  11. Frank Lu says:

    Hi Qingsong,

    Here’s a case need your help.  I created a linked server in SQL Server 2005 connecting to MYSQL.  I need to query the UTF-8 data from MYSQL and then insert them into local table.  After insert, the English chars are ok but Chinese chars are not displayed correctly.  Is there a way to use linked server retrieving data from MYSQL correctly?

  12. Qingsong Yao says:

    Hello, Frank

      First, please make sure that you are using nvarchar type in your SQL Server column.  Second, how you setup your linked server, which OLE-DB provide you are using?

  13. Frank Lu says:

    Hi Qingsong,

    Thanks for the reply.

    For linked server, I use Microsoft OLE DB Provider for ODBC Drivers which connect to a DSN – MySQL ODBC 5.1 Driver.

    Other information:  Database collation – Chinese_PRC_CI_AS;  Coding language for application – Java

    Please advise.

  14. Frank Lu says:

    Hello, Frank

     Are you using nvarchar type in SQL Server? What about you only select the data throught linked server, but not insert into the table.  

      The behavior really depends on how we map the character datatype defined in MySQL to OLE-DB type.    If it maps to varchar type,  you will have data loss, other it maps to nvarchar type,  you should be fine

    Qingsong

  15. BartJ says:

    I have a SQL server 2005 database that is has the default Collation of SQL_Latin1_General_CP1_CI_AS.   I am trying to store Turkish Data (8859/9 and/or codepage 1254).  There are actually only six characters that don’t store correctly in varchar columns.  For example, one of them is "Ş" (LATIN CAPITAL LETTER S WITH CEDILLA, U+015E, decimal 351).   So, for a test, I created new NVarchar(10) column thinking that would be all that would be required.  However when I run the following in SSMS T-SQL command in SSMS:

    INSERT INTO mytable (MynVarchar) Values (‘Ş’)

    the "Ş" gets changed to a Latin1 Captal "S" without the Cedilla–UNLESS I preface the unicode character with an upper-case ‘N’  as below

    INSERT INTO mytable (MynVarchar) Values (N’Ş’)

    I read in the article below that this is a requirement, but I do NOT understand WHY it is this way.   Why can’t I just use NVARCHARS and have the data inserted correctly as you wrote in your article above?  To me this means that it’s not a simple matter of changing my columns from Varchar to NVarchar, but in addition, I have to change *ALL* of my source code to include the prefaced ‘N’.   From your article above, (and many others on the internet, excluding the one above from Microsoft), I keep reading that I just need to use NVARCHAR, but this is not correct?  What am I missing here?   Very confusing…

    http://support.microsoft.com/kb/239530

  16. BartJ says:

    I should add that I also tried change the collation of the Database from the SQL_Latin1_General_CP1_CI_AS to Turkish_CI_AS.   What I found was that the command below worked WITHOUT the prefaced ‘N’:

    INSERT INTO mytable (MynVarchar) Values (‘Ş’)

    So at that point I said to myself "GREAT!!! No problem!–I will simply change the collation to Turkish_CI_AS on all Turkish Servers, and all Varchars to NVarchars and then we are good to go–WITHOUT having to modify all of my source code to include the ‘N’……   But then I connected to server in Turkey to test the same thing, (created a DB with Turkish_CI_AS as the collation, created an nVarchar(50) column and then ran the same insert command, and BOOM, the Ş is screwed up again and is now an "S"   I am extremely puzzled!!!

  17. Qingsong Yao says:

    The reason is that ‘Ş’ is a varchar type which has a code page associate with it.  N’Ş’ is a nvarch type and it is a Unicode character.  the sql_latin1_general_ci_as code page don’t have the character ‘Ş’ defined in the code page, so we will use the best matched charcter for replacement.

    For your secondary question,  I don’t know the reason.  It might be client side issue, can you try to use SQL Server Management Studio to run the query and see the result?

  18. BartJ says:

    I actually remote into their server and have been doing the tests inside of SQL server management Studio.   I just tried it again though, and this time, it worked!  I must have done something wrong when I set the collation the first time (I have been switching back and forth between the default collation and the Turkish and I must have goofed in my testing.)  

    Anyway, is it appears that setting collation to Turkish_CI_AS, I can then issue the Insert command below without the N and the Ş character is preserved.

    There are many articles that never mention collation settings though and seem to suggest that at least in terms of storing data "the collation doesn’t matter as long as you use NVarchar instead of Varchar as your data type"  These articles seem to me to suggest that the collation only has value in regards to sorting and comparing data.  

    In fact I just re-read this page of yours and you write:

    "When you choose nvarchar as your data type,  you automatically get UTF-16 Encoding.  Please note,  For nvarchar types, we always use UTF-16 encoding to store the data no matter which collation you use.  The collation is only matter on what method it will use to compare two string.  Since different language has different sorting rule, that is the reason we have many different collations."

    This again suggests that the collation doesn’t matter.   Can you explain?  Thank you.

  19. sai says:

    I have the following problem:

    server 1 : sql server 2005 sp1 with proper collation supporting UTF 16 for chinese language support

    I wanted to move the entire database to new server as it is , so i took backup and then moved it to new sql 2005 sp1 server and restored the backup , now when i query chinese characters are not displaying

    I have compared each and every setting but no luck.

    please advice what to check.

  20. Qingsong Yao says:

    Hello, Sai

     First,  let us confirm whether it is a display issue or data missing/lose issue. From my experience, it is unlikely the back/restore will cause data lose.  So I guess it might related to data display issue.  Can you confirm that you install the correct chinese fonts.  It happens for Winodw XP/2003 which did not install Chinese Package by default.  You can go to Control Panel -> Language, and install it

  21. sai says:

    hi qingsong,

    thank you very much for replying me back.

    i have confirmed that chinese package is installed , so please assist what would be my next steps

  22. Qingsong Yao says:

    Hello, Sai

     Can you privde some more information on what knind of query you are using,

    and the table structure. Also, what you see on your screen.

  23. BartJ says:

    Hi Qingsong,

    I am hoping you did not forget about me and my last question regarding whether the database collation is essential or not for "storing" unicode.  From my tests, it appear that it is, but as I’ve read from your posts above, and from many others, usually people say that the collation only matters in relationship to sorting and comparing teh stored data.

  24. Qingsong Yao says:

    Hello, BartJ

     For nvarchar type, the collation only matter for sorting and comparing data,

    but for varchar type, the collation also controls which language you can insert into.

    For developing International Application, please use nvarchar type, and the prefix N”,   it will help you reduce a lot of issues later.

  25. Resolved says:

    Hi Qingsong,

    Thank you for your response, it turned out to be chinese font not installed :)

  26. Nelins says:

    Hi Quinsong,

    actually a have a mysql database linked to SQLServer 2000 using oleDb for ODBC and MySQL ODBC 5.1.  The MySql Database is in UTF-8 Charset.  When I run a query, it’s returns a bad word, i mean instead to returns Bogotá, it’s returns Bogotá.

    I can’t change the collation, directly on mySQL database because it’s not mine.  How can I read those fields in the right way?  I’ve been trying to do it using collations syntax but it doesn’t works.

    I’ll wait for your response.

    Thnx.  Nelins

  27. yaoqs@hotmail.com says:

    Hello, Nelins

     I never have experimence using linked server with MySQL.  I will try sometime today.  Can you explore some other way to port data from MySQL to SQL Server

  28. Nelins says:

    Quinsong,

    I don’t know… I’ve been using this way in others linked Database in the same server and it works… just on this one is failing.

  29. foreright360 says:

    Why do I see a string of question marks when I run the following statements in query analyzer?

    create table testutf8(stringval nvarchar(200))

    insert into testutf8 values(‘马英九爱吃的李妈妈’)

    select * from testutf8

    returns:

    ?????????

  30. Qingsong Yao says:

    Hello, Foreight360,

     Please try:

      insert into testutf8 values(N’马英九爱吃的李妈妈’)

    select * from testutf8

  31. Ben Goh says:

    Hello Qingsong, nice article.

    I have worked with Unicode for years now and I do agree with everything you said. I am designing a database for a forum on my site, I would like to store each posted message body in UTF-8 format, so I believe I’ll need to resolve to VARBINARY instead of NVARCHAR. Imagine 700,000 user posts in database, what savings I can achieve with UTF-8 instead of UTF-16. :)

    So all that is fine, my question really is, how do I then get the FTS (Full Text Search) capability enabled for each of these messages? Since it is a forum, it is natural that users would want to search for certain contents.

    Another naive question: what does "N" in front of a string literal mean (I’m more of a C++ guy, SQL isn’t quite my cup of tea). Sorry I could have searched but "N" being a single character is not quite search friendly.

    Thanks in advance,

    Ben.

  32. I am using select … for xml auto

    is the encoding also utf16?

    I am unable to convert it to ascii.

    it is just normal xml, no special chars.

  33. Qingsong Yao says:

    Hello, Martien

     When you use for xml auto, the result is a XML data type, which is not nvarchar or nvarchar type and it has no encoding.  If you want to convert it to ascii, you can use explict convert/cast from xml to varchar type.  

  34. Qingsong Yao says:

    Hello, Ben

     Article Description of storing UTF-8 data in SQL Server at http://support.microsoft.com/kb/232580 is your best reference for Storing UTF-8 data inside SQL Server.

     I strong suggest you store your data in UTF-16 encoding inside of database instead of using varbinary data type. The main reason is that SQL Server don’t have any built-in support for UTF-8, so that you will unable to query/fulltext search UTF-8 data.  

     If you are using ASP or ASP.Net , the client will automaticlly handling coonverting frm UTF-16 data to UTF-8 for you.  So you will not experience on any issue. You can refer to this article http://support.microsoft.com/kb/893663 for details.

     It is true that you may waste some disk space, but you will have less trouble on handiling unicode data in your application.   If you want save disk-space, the data compression feature in SQL Server 2008 can achieve this, and the feature was enhanced in SQL Server 2008 R2 to achieve better compression ratio than UTF-8 encoding.

     The reasn of having N’ is because we have two different kinds of string const:

       ‘abc’  means a  constant with varchar types, since varchar type already have a code page, so that it can only store characteres defined in the code page.

     N’abc’ means a constant with nvarchar type which is UTF-16 encoding.

     Follow statement:

     create table t1(c1 nvarcahr(10)

     insert into t1(‘abc’)

    will insert a varchar ‘abc’ into a nvarchar column.  So even the c1 is nvarchar column, but the constant

    might be converted to ‘???’ before inserting into the table if a,b,c are not in the codepage.

     Please let me know if you have further questions.

  35. W Wong says:

    I’m in headache to think about a best match of ‘Collation in MSSQL’ and ‘Encoding (charset) in XML’ for Korean Language. Initially I pick ‘Korean_90_CI_As’ for testing. What is your recommendation for me? Thanks in advance.

  36. Hu says:

    Hi, thanks for the good article first.

    I have a problem, How can I use the prefix ‘N’ in store procedure in MS SQL Server to convert string to UTF-16 for storing?

    Thanks!  

  37. yaoqs@hotmail.com says:

    Hello, Hu

     If you are using SqlParamter in C#, be sure to use SqlDataType.NVarChar type instead of SqlDataType.VarChar, the system will pass the correct the value to the server.

     If you are building T-SQL query in C# or other language, you need handle N” by yourself when you construct the query

  38. Deepak says:

    Hi,

    I am trying to send multibyte data from oracle to sql server usign java program.

    It is arabic data that I need to fetch from oracle and put in sql server DB.

    When I store data in sql server DB and fetch it from there, it is all junk.

    Please help regarding what can be done to achieve consistency?

    Below is the process that I am following.

    1. fetch data from blob field of oracle DB.

    2. Create a String from it using charset UTF-8

    3. write this data in one txt file.

    4. Call a stored procedure on SQL server with this string as a parameter.

    This stored procedure is inserting data in column of type varachar.

    When I open txt file where I have dumped the data, it is showing proper arabic data.

    when same stored procedure is executed from sql server prompt with Arabic data, it is inserting data properly.

    Please help, as this is very urgent.

    -Deepak.

  39. Qingsong Yao says:

    Hello, Deepak

      You need nvarchar type of SQL Server to insert Unicode strings.  Also, please use nvarchar type as the parameter of your stored procedure.  Please let me know if you still have trouble on this. You can e-mail me at qyao@microsoft.com for further question.

  40. Alomari says:

    I am trying to send email using sp_send_dbmail, it works fine but if the email contains any non-English letter (Arabic letters) it will appear in the email as ????

    i tried to "@messageBody=TEXTTEXT collate arabic-bin" but i got the same result

    any idea is highly appreciated

    Thanks

  41. Matt says:

    Hello, thx for article

    I have a hube problem with correct display of polish characters on PHP application.

    The collation of table is set to default SQL_Latin1_General_CP1_CI_AS. Data type of column is nchar.

    When I insert records manualy with polish chars and then display them using select statement then all it's ok, i'm talking about using select query directly in MSSQL Server 2005. Problem appears when I want to display rows in PHP app. Charset of html document is set to UTF-8. I was trying to display chars correctly using utf8_encode function but no effect, all polish chars like "ąęłźćś" are auto converted to "aelzcs". Any ideas?

  42. Qingsong Yao says:

    Hello, Matt

      Can you see whether this article msdn.microsoft.com/…/cc626307(SQL.90).aspx can help you with the PHP scenario.

  43. Sankar says:

    Hi,

    In fact it was a great read. Thanks for the article. I have an issue with japanese characters. I am using SQL server to connect to a Progress database through ODBC. I have the fields in table set to nvarchar, and DB collation is French_CI_AS. The issue is that I am not able to correctly store japanese characters in the table. WHen I am inserting data manually using the N'' prefix, its working fine. How can I use N'' while selecting? For eg: if I have a query, insert into tab1 select field1 from tab2. How can I prefix N'' to field1?

  44. Qingsong Yao says:

    Hello, Sankar

     For select, you don't need the prefix of N'. For your query:

         Insert into tab1 select field1 from tab2

     it should work fine.  Can you make sure your column is nvarchar types. and give me more information about your issue, and I can help you further.

  45. Hi,

    Good article in an important but lightly covered area.

    My question is how can one migrate data on equivalent schema from SQL Server 2005 to PostgreSQL 8.3.

    I am using the simple method of saving SQL data into a csv file, transferring it to a Linux box, then using the "copy …from …with csv" command to load the data into the PostgreSQL table.

    Unfortunately, this throws the error:

    ERROR:  invalid byte sequence for encoding "UTF8": 0xff

    HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

    What needs to be done on the SQL Server side to make this data importable to a UTf-8 database?

    Thank you

  46. Qingsong Yao says:

    Hello, Larry

     Can you try to use notepad to use the csv file. and save as [UTF-Encoding] to the file.

  47. Oidon says:

    > SQL Server treats a UTF-16 supplementary character as two characters.

    > The Len function return 2 instead of 1 for such input.

    I have regular need for characters above U+FFFF (the extended Han characters, specifically).

    The fact that these characters are counted as two characters is quite problematic.

    For example, in a user interface, a user is allowed to enter a field up to 25 characters.

    This will be a mixture of characters below U+FFFF (1 "character") and a few above U+FFFF (2 "characters"). To the user, the total number of characters that one can actually enter is thus less than the defined 25.

    There are no workarounds so we often need to make the width 15% larger than we actually say is possible.

    Rather, the length should be counted as UTF-32 units.

  48. In next version of SQL Server, we will have new UTF-16 collation. The length issue will be address. However, when you declare a field/column as nvarchar(10), it might still hold 5 characters due to the Extended Han characters.  The reason is that the length semantic for data type is not changed.

    Oidon,

     Please keep in mind that the Extended Han characters is not frequently used in many apps, so you can include a small number to the total numbers, it should work in many cases.

    Qingsong

  49. Mukesh Bhawnani says:

    Hi

    I am trying to use CDOSYS for sending out mails in SQL server 2005. In  a recent requirement, I need to send out mails in Korean language as well. When tried, the subject line and the body is not getting displayed properly (displaying "???" instead of Korean characters). I have used nvarchar data type wherever required.  For string constants I have preceded N'.

    Can you please advice.

  50. Mubashir says:

    Would uou please tell what collation i may use for Thai language for compatibality with java

  51. You can use any collation with Thai language if you use nvarchar data type.  If majority of your data is in Thai lanaguage, you might choose collation Thai_CI_AS or Thai_100_CI_AS.

    Qingsong

  52. QingsongYao please help me simplify

    1. In regards to XXX_CI and XXX_CS do I undestand correctly that with XXX_CI “Three = ThREE” would be true.

    2. Respevtively with XXX_CS “three = ThRee” it would be false and only “Three = Three” it would be true?

    3. Also is this true in the cases of IF compare, Where clause, Join predicates etc.

    You can just use "yes" or "no" 3 times and that would be all.

    Thank you.

  53. Atari says:

    I am trying to send email using sp_send_dbmail, it works fine but if the email contains any non-English letter (Arabic letters) it will appear in the email as ????

  54. hari says:

    I want to insert tamil language.but sql server 2005 does not insert properly it had inserted  in this format only ஹரி and i have set in nvarchar …my query is insert into tamil values(N'ஹரி');

    please help me

  55. QS says:

    what is your data type, and how you insert the data, using Sql server management studio or other tools.

  56. Sunil says:

    Hi QingsongYao

    Could you please help me on below:

    I have a user who is pulling data from a SQL Server that we do not support anymore.  The db is on SQL 2008 and the file that the user receives in is being set as UTF6.  What the user needs is the data to come through as UTF8 (Unicode).  What this tells me is that the SQL Server is set to UTF6 or the DB is set to UTF6.  What they would like to do is to take the data in the UTF6 db and have it converted to UTF8.  Is there a way to write a SQL syntax that will allow to specify UTF8 and have the data converted in that format?

    Thank You

    Sunil

  57. QingsongYao says:

    Hello, Sunil

     I remember we block bcp in/out UTF-16 to UTF8 encoding during SQL Server 2008, I guess your customer is hitting this issue. The workaround is a encoding transform at client side. You can use Win32API, .Net, Java, Powershell or many tools to achieve this. Note,  I mean when your bcp out your data, you must specify that you want Unicode format.

  58. Sunil says:

    Thank You so much for your reply QingsongYao.

  59. Chandrakala says:

    Hi QingsongYao….

    Can we turn on UTF8 encoding for the database in SQL Server 2008 (SP3) Standard Edition (64-bit) ?

  60. Qingsong Yao says:

    This is no UTF-8 support in SQL Server 2008 and SQL Server 2012.

  61. Murali says:

    HI Iam passing query parameter string value . It works fine for other special chars except japanese and chinese. Iam using sql server db with weblogic12c AND JAVA1.6

  62. Qingsong Yao says:

    what is the data type,  how you send command or queries to SQL server.

  63. Qingsong says:

    what is the data type in Sql server, and how you send data to sql server.

  64. Dule says:

    I need to compare two MD5 Hashes.

    The first is created in C# with MD5CryptoServiceProvider and because the code is not in our control it must be UTF8 Encoding.

    The second in SQL, here is an example:

    select convert(varchar(32), hashbytes('MD5', N'text :; 134 текст'), 2)

    1. If I use Unicode Encoding in C# it works with all the characters.

    2. If the text is in Latin and the Encoding is UTF8 the hash match, but if there is other type of characters like Cyrillic then i cant get them to match.

    I used collations, different data types.. but noting up till now works.

    I'm not that comfortable working with Encodings, so I must be missing something…

  65. Bianca says:

    Hi QingsongYao,

    I am using SQL Server 2008 and I encountered a problem.

    I created a table and an index on a column defined as nvarchar:

    CREATE TABLE dbo.TestTable

       (Id int NOT NULL,

        Name nvarchar(50) NULL);

    GO

    — Create a clustered index called Index_On_TestTable

    — the Name column

    CREATE UNIQUE CLUSTERED INDEX Index_On_TestTable

       ON dbo.TestTable (Name);

    I introduced into table the values (1, Ecosyst), (2, Ecosyst☆) and (3, Ecosyst抎).

    But I can't introduce the value (4,Ecosyst☆). I get this error:

    Cannot insert duplicate key row in object 'dbo.TestTable' with unique index 'Index_On_TestTable'.

    Why? The last name doesn't exist in my table.

  66. Qingsong Yao says:

    We do linguistic comparison, it is possible that the two last rows are comparing equal.  Can you try to verify whether they are equal:  such as search the third string in the table.

  67. Bianca says:

    Sorry, I have a syntax mistake in the first comment. So, I have only this three values in the table:

     (1, Ecosyst), (2, Ecosyst☆) and (3,Ecosyst抎).

    But, I can't add the value (4,Ecosyst☆) although it doesn't exist in my table.

  68. Bianca says:

    Some special characters don't appear in the comment…

  69. Qingsong says:

    I think it is still issue related to linguistic comparison. The symbols of your last character are treated as equal. If you use latin1_general_100_bin2 collation, you will not see this issue.

  70. Bianca says:

    Yes, it works using latin1_general_100_bin2 collation. Thanks for your help!

  71. Jay says:

    I'm trying to retreive Russian Characters from sql server.

    It' showing as ????.

    What the correct Collation for nvarchar for multi language including Russian ?

    nd a solution asap..

  72. QingsongYao says:

    Jay,  I think the problem is the insertion part, not storage part.  Please always use N'' to insert your data.

    any collation with nvarchar data type can store any Unicode characters.

  73. Jay says:

    Hi QingsongYao,

    Thanks a lot for quick reply. :) :)

    I have tested with ur comment.

    But to retrieve the particular Russian value, I had to put prefix 'N' also.

    Our application is already developed one and is there any other solution to do so without using 'N' ??

    Any other collation or any other way to get solved this issue ??

  74. Qingsong says:

    In this case, choosing a Russian collation for your database should resolve your issue.

    Note,  it will only support Russian characters + Latin /ASCII characters since your app use '' instead of N''.