Locales in SQL Server


Introduction

Writing locale-safe external code in SQL Server has always been important due to the global nature of many businesses running on SQL Server. Doing it correctly requires a good knowledge of the different mechanisms used to store that locale information, and understanding which is the relevant locale in each context.

In order to explain the interaction between different mechanisms for storing the locale of the system or program, I will briefly describe each of them.


CRT Locale:

The CRT (C runtime) is a general purpose library that Microsoft ships with multiple products, including Windows (msvcrt.dll) and Visual Studio (msvcrXX.dll). It holds the implementation for various useful C/C++ things such as exception handling, function libraries, memory management, threads and process utilities, etc. Most C/C++ programs end up consuming at least some of these features.

Since the CRT library needs to handle data that might come from different cultures, the functions provided in it need to take into account the current culture so it can, for example, know what character is used as a decimal separator, what is the uppercase/lowercase version of a given character, how to format dates, etc. In the CRT, the name for this culture information is a locale, and the setlocale function is used to set the current locale for that particular CRT library. If your process happens to have different versions of the CRT loaded, then each CRT has its own locale setting. A list of these locale-sensitive functions can be found here: https://whidbey.msdn.microsoft.com/library/?url=/library/en-us/dv_vccrt/html/442f8112-9288-44d7-be3c-15d22652093a.asp?frame=true


DBMS Locale (Collation)

In SQL Server terminology, the culture is called a collation. The collation can be set per table column, although it is usually only set at the database level, using the COLLATE clause at either creation time or through an ALTER statement. Operations such as conversions and string sorting will take that locale into account. Collations in SQL Server are a complex topic for which further info can be found there https://msdn.microsoft.com/library/default.asp?url=/library/en-us/instsql/in_collation_72pg.asp .


CLR Locale (Culture)

The CLR locale is maintained in a CultureInfo structure which you pass in to certain operations. If you do not explicitly pass the CultureInfo structure in, then the operation requiring it will get the default CultureInfo, which is initially set according to the Windows locale setting in the control panel. With the right permissions (in this case, ControlThread) you can change the current thread’s default culture using the Thread.CurrentCulture property. Double.ToString, DateTime.Parse, String.Compare and String.ToUpper are prime examples of methods that depend on the current culture. See https://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnnetdep/html/dngrfCultSafeCode.asp for a more in-depth discussion about Culture in managed code.

 


Things to be aware of when writing locale-safe code in SQL Server

Even though SQL Server is a C++ program that uses the CRT, it really explicitly maintains its own locale as a DBMS, which has a much finer granularity than the CRT locale. As such, it does not want to use any specific CRT locale and instead uses the "C" (default) CRT locale.

A problem particular to SQL Server is that we run user-written code in our process, in the form of extended stored procedures (XPs) or CLR procedures written in managed code. That user code may itself call setlocale, which may have a destabilizing effect on the entire process, including user code running in different threads, and the database engine itself.

In order to mitigate this issue, we used to call setlocale ourselves when we got back from user code so that we would reset the locale to "C". This is not a perfect solution, since between the time the user calls setlocale and the time we reset it back, there is a window of possible incorrectness. That window can be arbitrarily big depending on what kind of processing the user code does after setting the locale. Furthermore, it so happens that the call to setlocale is very expensive in terms of performance and scalability.

For that reason, we used new features of the VS2005 CRT that allows the engine code to have no ties to the CRT locale anymore, negating the need to call setlocale when we come back from the user code in the first place. This gave us a very interesting performance boost for CLR procedures; however, we still call it when we come back from XPs for backward compatibility reasons.

Writing an XP is unfortunately very tricky, and, in addition to the other worries you should have (which are already well covered on MSDN and Books Online), you should keep in mind that locale-sensitive CRT functions you may use will be using the CRT locale, which has nothing to do with the current database collation. Furthermore, this also means you are at the mercy of other users in the server who might have modified the CRT locale to something you would not expect. For that reason it is best that you keep your code locale-agnostic by using the new _l functions (such as printf_l, atoi_l, strncmp_l, etc.) available starting from the VS2005 CRT. These functions require you to pass in an explicit locale structure which you need to create/manage yourself. It may seem like more work, but it is a very small price to pay in order to get proper isolation from other activity in the server.


The good news is that writing managed code to run in the server is much easier!

The current culture, Thread.CurrentCulture, cannot be affected by other users running on the server. However, you should not rely on the current culture, other than it being set to default (according to the control panel, not current database collation!), unless you explicitly set it yourself whenever your function is run. That is because in the case of SQL Server, the current culture, like any other static data in your assembly, is not guaranteed to be stable across multiple calls to your CLR procedure. Furthermore, if your assembly happens to P/Invoke into some unmanaged code that relies on the CRT, or maybe even the CRT itself, keep in mind that you may face the same issues mentioned above for XPs, so it should best be avoided.

 


Recap

Here is a quick review:

-          The CRT, CLR and Database locales are distinct values, independently modified and queried using different mechanisms.

-          Whenever you write code that manipulates values that comes from the server, be aware of the different cultures your data might use.

When writing an XP

-

       The current CRT locale is not stable and may change while your XP executes, either because either another user or the server changed it.

-

       Whenever possible, prefer to pass in an explicit locale (locale_t structure) to CRT functions that accept it.

-       The CLR makes this much easier!

When writing a CLR routine

-

       The content of Thread.CurrentCulture is set for you to the default value specified in the control panel each time your code is invoked. The value is stable for the duration of your routine. Modifications you make to that property, or to any other static object, are not durable and will need to be redone each time your code runs.

-

       Whenever possible, prefer to pass in an explicit locale (CultureInfo) to functions that allow it.

 

- Mat Henaire
Microsoft SQL Server