SQL Server 2008 (Katmai)'s new collations

 

In SQL Server 2008 release,  a number of new collations were added.  You can get the list of 79 new collations (only _CI_AS was listed) by using

        select * from fn_helpcollations() where name like '%_100_ci_as'

These new collations are aligned with the linguistic sorting of Windows Server 2008 (they use very similar algorithm).  In term of functionality, following new features are added:

  • Support sorting characters defined in Unicode 5.0.  The new collations add sorting weight for newly 6000 characters which are defined in Unicode 5.0 to make them sort linguistic correctly.
  • The upper/lower function was updated for these new collations.  Around 200 upper/lower cases which are newly defined in Unicode 5.0 are included in the new collations.
  • Linguistically sorting for surrogate character pairs. Previously, in the 90 level collation (such as Chinese_PRC_90_CI_AS), SQL Server assigns certain weight to each surrogate code point, but they are not linguistically correct.  In the new collation, the weights are re-assigned according to their linguistic meaning, so that they can be linguistically sorted.
  • The new collations take advantage of two new comparison flags: LINGUISTIC_IGNOREDIACRITIC,LINGUISTIC_IGNORECASE in CompareString function to support better linguistic case insensitive and linguistic diacritic insensitive comparison.
  • We add several new collations for the new locales added during Windows XP2 and Windows Vista. These collations are not exists in previous version of SQL Server, and the new added collations will provide better support for these new locales.