Dev Info: Which StringComparator to use?

 

This post continues on the Dev Info from the previous post.

 

 

 

 

The confusion over what string comparator to use is close second to the #1 coding confusion (mix up between CultureInfo.InvariantCulture and CultureInfo.CurrentCulture) mentioned earlier.  MSDN has a good detailed article on this here and below is just my summary of the same to increase awareness.

 

What is StringComparison?

Specifies the culture, case, and sort rules to be used by certain overloads of the String.Compare and String.Equals methods.  Corresponding to each StringComparison value, there is StringComparer that can be used too for various string comparison functionality.

 

For example –

// Here the result will be always TRUE
bool result = string.Equals("string", "STRING", StringComparison.OrdinalIgnoreCase);

// Here the result will be TRUE for most cultures but for culture like Turkish
// it will be false. Turkish has four i's.
bool result = string.Equals("string", "STRING", StringComparison.CurrentCultureIgnoreCase);

 

 

 

 

 

 

English vs. Turkish Case Mappings

 

 

 

 

Language

 

 

 

 

Letter

 

 

 

 

Lowercase Map

 

 

 

 

Uppercase Map

 

 

 

 

English

 

 

 

 

 

i

 

 

 

i

 

 

 

I

 

 

 

Turkish

 

 

 

 

dotted i

 

 

 

i

 

 

 

İ

 

 

 

 

Turkish

 

 

 

 

dotless ı

 

 

 

ı

 

 

 

 

I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

What are various StringComparisons?

 

 

 

StringComparison member

 

 

 

 

Description

 

 

 

 

When to use

 

 

 

 

Ordinal

 

 

 

 

Performs an ordinal comparison.

 

 

 

· Case-sensitive identifiers in standards such as XML and HTTP.

 

 

 

· Case-sensitive security-related settings.

 

 

 

· Other case-insensitive system\OS resources.

 

 

 

OrdinalIgnoreCase

 

 

 

 

Performs a case-insensitive ordinal comparison.

 

 

 

· Case-insensitive identifiers in standards such as XML and HTTP.

 

 

 

· Case-insensitive security-related settings.

 

 

 

· File paths.

 

 

 

· Registry keys and values.

 

 

 

· Environment variables.

 

 

 

· Resource identifiers (for example, display names).

 

 

 

· Command line arguments

 

 

 

· Other case-insensitive system\OS resources.

 

 

 

CurrentCulture

 

 

 

 

Performs a case-sensitive comparison using the current culture.

 

 

 

· While working with most user data (and not system\OS data)

 

 

 

· While showing\sorting the system\OS data mentioned above to the user (but not when comparing for equality).

 

 

 

CurrentCultureIgnoreCase

 

 

 

 

Performs a case-insensitive comparison using the current culture.

 

 

 

InvariantCulture

 

 

 

 

Performs a case-sensitive comparison using the invariant culture.

 

 

 

Valid in very rare cases – best is to forget these exist.

 

 

 

InvariantCultureIgnoreCase

 

 

 

 

Performs a case-insensitive comparison using the invariant culture.

 

 

 

 

Some Examples!

 

 

 

Use Case

 

 

 

 

StringComparison to use

 

 

 

 

Reason

 

 

 

 

Check file name, URL or registry value etc. for equality

OrdinalIgnoreCase

These system resources are case-insensitive (at least in .NET\Win32 context).

Sort file name (or any other system resource) to show to the user in the UI

CurrentCultureIgnoreCase

User will want sorting to be in his culture, the way he understands it.

Comparing PropertyName in WPF’s PropertyChangeHandler – a case insensitive data with a const string

Ordinal

The only reason for using Ordinal (over OrdinalIgnoreCase) is that you know the data will not vary in case and Ordinal is faster.

 

Other notes

  • Most APIs (probably except String.Equals and String.Contains) that has overloads that do not take StringComparison internally uses StringComparison.CurrentCulture.  The String.Equals uses StringComparison.Ordinal.  Again, even if your intention matches the default, it is better to explicitly pass appropriate StringComparison to make things clear.  FxCop will also warn you if you don’t pass StringComparison.
  • String.Contains does not have any overload that takes StringComparison and it is best to avoid it.  Use String.IndexOf() != -1 instead with appropriate comparison.
  • You need to be careful with string.ToLower\ToUpper functions.  Because of the cases like “Turkish i” mentioned above, the ToLower() of a char might give completely different result in different culture.  If you need to make something lower case in culture invariant manner, use ToLowerInvariant().
  • In C#, you can use strings in switch\case statement.  Again this internally calls String.Equals with Ordinal comparison which may not be what you want.  The best is to avoid this construct.
  • When you creating objects like Dictionary with string key, there is implied comparison internally.  In such cases, you should pass the comparison\comparer while creating -

   Dictionary<string, int> stringDictionary = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);