Avoiding the Turkish i issue

There are 4 i's in the Turkish language - the small dotted i(which is the same as English), the small undotted ı, the capital dotted İ and the capital undotted I(which is the same as English). Capitalization works in this way: ı.ToUpper() will yield I and i.ToUpper() will yield İ. Of course, there has to be a sort order prescribed to these too - the sort order being ı<I<i<İ  Note that I<i in the Turkish culture whereas i<I in English. Hmm - do you smell bugs here? In fact, the "turkish i" issue has earned enough notoreity to actually have dedicated testing with these characters on Turkish setups.

Why do these bugs arise? Comparison based on default methods is a common example. String.Compare("mine".ToUpper(), "MINE") will not yield a zero on a Turkish culture. Obviously because the small dotted i translates to İ and not I on capitalization. Using ToUpper(), ToLower() to normalize the string for comparison will always yield undesirable results on different cultures.

We are currently testing Team Build on a Turkish setup and we are trying out a lot of cases around the Turkish i, esp sorting. The turkish eye bugs that I have seen in the past while testing other products are invariably due to incorrect usage of String APIs. Here is a really cool paper on handling string APIs that gives a specific set of dos and donts wrt string APIs. I bet several people are surprised to know that using InvariantCulture can actually lead to a couple of issues even in places that seemed tailor made for using InvariantCulture! Of course, the new StringComparer is super cool and offers as much clarity as possible about what to use where. Checking these rules while doing static analysis is certainly going to save a lot of pain in i18n issues.

Comments (4)

  1. Sorry for the trouble:)

    What you wrote about is the part of everyday problems of a Turkish developer. Capitalization is one issue, having non-standart characters is another, like ç, ğ etc.

    Did you know that on SQL Server, SELECT itemID FROM MyTable fails if the field is defined as ItemID and DB collation is Turkish? The time and money spent on such bugs are enormous.

    Anyway, thanks for the paper. This will make my life a little more easier.

  2. Excellent observation Gokhan. 🙂 I recently saw a similar bug where the stored procedures were written in T-SQL with the table name capitalized and this refused to work on the DB. I sure wish I could have guessed this earlier.

  3. me says:

    But its not a bug! itemID and ItemID are different if using Turkish collation.

Skip to main content