C#: Fast string comparison optimization


Sometime back I had blogged about using reference comparison to speed up ordinal string comparisons. Essentially this means that if you know for sure that your string is interned and you want to do an ordinal string comparison you can use the following optimization

if (command == “START”)

Console.WriteLine(“Starting Build…”);

else if (command == “STOP”)

Console.WriteLine(“Stopping Build…”);

 

// Optimization

if (Object.ReferenceEquals(command,“START”))

Console.WriteLine(“Starting Build…”);

else if (Object.ReferenceEquals(command, “STOP”))

Console.WriteLine(“Stopping Build…”);


This works only if the string command is an interned string. If its not (e.g. its accepted from command lines or is constructed) you can intern it using something like string internedCmd = string.IsInterned(command) ?? string.Intern(command); and then use the interned version of the same string.


However an email to the internal CSharp alias made me look again. The email stated that the static method String.Equals(a,b) is more efficient than instance method String.Equals(a). I looked in into the code and it appears that for static Equals we already do the reference comparison first, while for instance Equals we jump right to string comparison.

// Static Equals called from overloaded operator ==
public static bool Equals(string a, string b)
{
if ((object)a == (object)b)
{
return true;
}
if ((a != null) && (b != null))
{
return string.EqualsHelper(a, b);
}
return false;
}

This might get fixed in future .NET version when both Equals will do the reference comparison first. However, the optimization I suggested does not hold because its already been done in the framework right now.


So currently in the following snippet the second version is faster than the first

string a = …;
string b = …;




a.Equals(b);
string.Equals(a, b); // faster

Comments (23)

  1. Honestly, why do you care for this?

    I run a billion string comparision in 1.5 seconds.

    This performance difference is so small that I don’t even think about it until I start doing profiling, and that optimization is way down the list.

  2. Frank says:

    what about the switch statement? or has that just the same performance as the else if?

    Ayende has a point, but well, its always nice to know 🙂

  3. Frank take a look into http://blogs.msdn.com/abhinaba/archive/2005/10/28/486173.aspx. In this blog I had posted about common string comparisons and how they work including switch. in Switch the overloded == operator is called so the final comparison occurs in String.Equal(string, string)

    As you said its nice to know. In case your application does lot of string comparison the info might be helpful… Anyways for framework development its always nice to know where time is being taken in your API… But as usual optimize only based on profiling and only when the data shows you are not meeting your perf criteria. For all other purpose elegance and maintainability wins over optimization

  4. Keith Farmer says:

    Does the framework’s shortcut still require the strings be interned, then?

  5. The reference match succeeds only for interned string. The point is I had said in the earlier post that to take the benefit of interned string explicitly do a Object reference match. With the new info I got you are equaly good if you do a string.Equals(a,b) kind of comparison. However, both strings have to be interned….

  6. Magnus Hiie says:

    I think the ReferenceEquals method is still faster if the strings are interned because when the strings are not equal, the == operator and string.Equals still perform string comparision, however ReferenceEquals can just return false.

  7. Garry Trinder says:

    BTW, Why interning requere you to change reference value in your code ?

    Why Runtime unable to change reference for you automaticaly ? It’s allowed to move pointers all over heap as long as they are not pined.

    I.e. my suggestion is to make string.Intern() returning void.

    This can be smart move – once string.Equals operator (or ==) find two exact strings but at different locations – then reference to one of them will be changed to another; second unreferenced and become eligible for garbage collection (even more – all others references to second string will be noticed during GC and changed).

  8. Noah Coad says:

    I recently had an app that I used the VSTS Profiler on to find that a significant portion of time was

  9. Elz says:

    Well, i made some tests for 2 strings:

    string a = "ASDFGHJKL";

    string b = "ASDFGHJ";

    In my measurements a.Equals(b)(average: 864 ms) performed a lot faster than string.Equals(a,b) (average: 2395).

    I used Framework 2.0.

    So if you could enlighten me i would appreciate.

  10. Do the same in a loop for say 100,000 times and then take the average….

  11. Elz says:

    Hi!

    Here is my code:

    using System;

    using System.Diagnostics;

    namespace ConsoleApplication3

    {

       class Program

       {

           static void Main(string[] args)

           {

               string a = "ASDFGHJKL";

               string b = "ASDFGHJ";

               Stopwatch stopper = new Stopwatch();

               stopper.Start();

               for (long i = 0; i < 200000000; i++)            

               {

                   //string.Equals(a, b);

                   a.Equals(b);

               }

               stopper.Stop();

               Console.WriteLine("Time: {0}", stopper.ElapsedMilliseconds);

           }              

       }    

    }

    a.Equals(b) – takes around 1650 ms.

    string.Equals(a,b) – takes around 4900 ms.

    So what is the catch?

  12. shareef says:

    Elz is correct on my Dual Xeon procesor,

    a.Equals(b) – 1409 ms

    string.Equals(a,b) – 1610 ms

    a == b – 1610 ms

  13. shareef says:

    Also, c# reference says string == operator checks for values not reference.

    http://msdn2.microsoft.com/en-us/library/53k8ybth.aspx

  14. itsmarik says:

    Here is what I found…

    If a=b, string.Equals(a, b) is fastest.  If a <> b or a.Length <> b.Length, a.Equals(b) is fastest.  

    The example Elz gave, a and b have different lengths.

  15. tomas says:

    When looking on this results think that it is not better to use static Equals() – is fastest only if strings are same – or is good only if we just testing and expecting that string equals, but always – it plays role only in case of huge usage of string comparisons.

    This are meassures from testing code:

    * m1 = first method a.Equals(b)

    * m2 = second method string.Equals(a, b)

    (using == operator is almost same then m2, but always very little slower)

    CNT = 20.000.000

    1. different length : m1=199  m2=216

    2. same length but different: m1=473  m2=499

    3. same strings : m1=485  m2=118

    CNT = 2.000.000.000

    1. different length : m1=18708  m2=21120

    2. same length but different: m1=47607 m2=49944

    3. same strings : m1=48626  m2=11262

  16. MrY says:

    I think the string comparison optimizatioin is not so useful.

    I wound rather to use the "a.Equals(b)".

    I think the optimization of string contains() or indexof() is more useful.

  17. Jan says:

    The reason is, that both String.Equals(Object) and String.Equals(String) are virtual, thus cannot be inlined, whilst static String.Equals(String,String) can.

  18. woohoo says:

    why not use:

    string.Compare(…) ?

  19. daveblack says:

    I haven't seen anyone mention that you need to run the tests with BOTH values that are equal and values that are not.  Some methods are faster when the strings are equal but slower than others when the strings are different.

    My metrics using Vance Morrison's MeasureIt app (by extending it), shows that string.Compare() is approx. 50x SLOWER than using either static string.Equals(a,b) or a.Equals(b) if the values are the the SAME.

    When the values are DIFFERENT, the string.Compare() is approx. 50x SLOWER than using either static string.Equals(a,b) or a.Equals(b) is approx. 12x SLOWER.

    You should use either the == operator or the Equals method for testing string value equality – string.Compare wasn't really meant for equality.

  20. daveblack says:

    Sorry,  copy/paste error in my post…

    I meant to say if the values are different, string.Compare() is approximate 12x slower.