Performance Quiz #1 (of a series?)


Recently there was a discussion on one of our internal email aliases in which this problem came up. I though it was an interesting problem so I posed this Quiz to assorted people I work with to see what kinds of things they would say.


Considering these three options:


Option 1:



sw.WriteLine(subject + “: ” + message);


Option 2:



sw.WriteLine(“{0}: {1}”, subject, message);


Option 3:



sw.Write(subject);
sw.Write(“: “);
sw.WriteLine(message);


Answer these questions:


Q1. Which of these choices offers the best performance?
Q2: Can you characterize the memory usage of each of these?
Q3: Which would the performance team generally recommend, and why?
Q4: What special factors might alter this recommendation?
Q5: What did you have do assume about “sw” to answer the question?


I encourage you to think about this for a few minutes (more is good too) before reading past the line. SPOILERS/ANSWERS follow.



For everyone that took the time to think about this at all, thank you very much.

In the course of my job I’m often asked to comment of the probable performance of an assortment of solutions to give guidance, or at least suggest what measurements should be made. So I approached this quiz the same way and made my best guesses/recomendations as I would if I could not do the measurements. Then I went back and did the actual measurements.

Here are my own answers:


Q1. Which of these choices offers the best performance?



  • Only thing I can say for sure is that #2 will lose to #1 in all cases
  • #3 is going to be best if the output is buffered
  • #1 is going to be best if the output is unbuffered
  • In any case, a typical program’s overall performance will be unaffected by the choice

Q2: Can you characterize the memory usage of each of these?



(These answers all proved to be “close but no cigar” due to the unusual WriteLine behavior discussed in the real analysis, see below)

All three probably have allocations associated with buffering the stream, ignoring those as invariant, the allocations unique to each choice are:

#1 single concat operation, one temporary string
#2 assorted allocations, including string builder, underlying string buffer, vararg array (I was close)
#3 no allocations


Q3: Which would the performance team generally recommend, and why?



Even though it’s the worst performing, and we knew that much in advance, both of your CLR Performance Architects concur that #2 should be the default choice. In the highly unlikely event that it becomes a perf problem the issue is readily addressable with only modest local changes. Normally you’re just cashing-in on some nice maintainability. We do not alter our choice given the data below.


Q4: What special factors might alter this recommendation?



Specific measurements indicating that the code path had become a hotspot.


Q5: What did you have do assume about “sw” to answer the question?



Only that the stream did not have exotic behavior (such as weird cryptographic features that make the cost model very complex) and that it was buffered. In the event of an unbuffered stream of one type or another there are signficant semantic differences between (1 or 2) and (3) and potentially huge perf differences too.


OK, time to see how we did.

To do the analysis below I used the following benchmark program and CLR Profiler, which is one way to look at this data.





























































































namespace Test
{
    using System;
    using System.IO;
 
    class Test
    {
        static private String s1 = “Hello”;
        static private String s2 = “Good bye”;
        static private int iterations = 5;
        static private volatile int foo = 0;
 
        static private MemoryStream ms = new MemoryStream(100000);
        static private StreamWriter sw = new StreamWriter(ms);
 
        public static void Main(string[] args)
        {
            int i;
 
            for (i=0;i<iterations;i++) Test1();
            for (i=0;i<iterations;i++) Test2();
            for (i=0;i<iterations;i++) Test3();
        }
 
        public static void Test1()
        {
            sw.WriteLine(s1+”: “+s2);
            foo++;
        }
 
        public static void Test2()
        {
            sw.WriteLine(“{0}: {1}”, s1,s2);
            foo++;
        }
 
        public static void Test3()
        {
            sw.Write(s1);
            sw.Write(“: “);
            sw.WriteLine(s2);
            foo++;
        }
    }
}

The results below are in the form of an execution trace showing functions and allocations for each of the three options as reported by CLRProfiler.















































































































































































































































































































































































































































Results for Option 1      
              Calls Bytes Allocs
static void Test.Test::Test1() 4 94 2
  static String System.String::Concat(String,String,String)   48 1
    System.String(48 bytes)   48  
  void System.IO.TextWrite::WriteLine(String) 2 46 1
    System.Char[] (46 bytes)   46  
    void System.String.CopyTo(int32,wchar[], int32, int32)      
    void System.IO.StreamWriter::Write(wchar[], int32, int32)      

So option 1 gives 4 calls, 2 allocations, for 94 bytes.

Things to note:


  • There was only one concat operation, so only one extra string allocation — many expected two separate concatenations
  • We didn’t go down the formatting path, even though we used WriteLine — when you use WriteLine with one arg, the version you get does no formatting — many didn’t know this
  • When we eventually call WriteLine we converted the string into a char array and then appended linefeed characters before calling the underlying Write, nobody expected/predicted this
  • The char array that holds the string plus linefeed is smaller than the string was because it doesn’t have the string object overhead and (I think) lacks null termination

Results for Option 2

             

Calls


Bytes


Allocs

static void Test.Test::Test2() 30 184 5
  void System.IO.TextWriter::WriteLine(String, Object, Object) 29 184 5
    System.Object[] (24 bytes)   24  
    static String System.String::Format(System.IFormatProvider, String, Object[]) 25 114 3
      System.Text.StringBuilder (20 bytes)   20  
      void System.Text.StringBuilder::.ctor(String, int32) 3 66 1
        void System.Text.StringBuilder::.ctor(String, int32, int32) 2 66 1
          static String System.String::GetStringForStringBuilder(String, int32, int32, int32) 1 66 1
            System.String (66 bytes)   66  
            void System.String::NullTerminate()      
      System.Text.StringBuilder System.Text.StringBuilder::AppendFormat(System.IFormatProvider, String, Object[]) 16 28 1
        wchar[] System.String::ToCharArray(int32, int32)   28 1
          System.Char[] (28 bytes)   28  
        String System.String::ToString()      
        System.Text.StringBuilder System.Text.StringBuilder::Append(String) 3    
          static bool System.IntPtr::op_Inequality(int,int)      
          void System.String::AppendInPlace(String,int32) 1    
            void System.String::NullTerminate()      
        System.Text.StringBuilder System.Text.StringBuilder::Append(wchar[], int32, int32) 4    
          String System.Text.StringBuilder::GetThreadSafeString(int&) 1    
            static bool System.IntPtr::op_Equality(int,int)      
          void System.String::AppendInPlace(wchar[], int32, int32, int32) 1    
            void System.String::NullTerminate()      
      String System.Text.StringBuilder::ToString() 3    
        static bool System.IntPtr::op_InEquality(int,int)      
        static bool System.IntPtr::op_InEquality(int,int)      
        void System.String::ClearPostNullChar()      
    void System.IO.TextWriter::WriterLine(String) 2 46 1
      System.Char[] (46 bytes)   46  
      void System.String.CopyTo(int32,wchar[], int32, int32)      
      void System.IO.StreamWriter::Write(wchar[], int32, int32)      

Many people expected Option 2 to be the best for perf. Ahem. It isn’t :)

30 function calls. 5 allocations. 184 bytes allocated.

Things to Note:


  • First allocation is to take the two fixed arguments and put them in an object array so that we can use the same path for all flavours of WriteLine
  • Second allocation is to create the Stringbuilder that we will use to accumulate the formatted string, noting that only StringBuilder supports formatting, everyone else borrows it from StringBuilder
  • Third allocation is the initial buffer for the StringBuilder
  • Fourth allocation is where we convert the format string argument of WriteLine into a char array
  • Fifth allocation is where we append a newline to the string parameter of TextWriter.WriteLine as above

Results for Option 3

     
             

Calls


Bytes


Allocs

static void Test.Test::Test3() 7 32 1
  void System.IO.StreamWriter::Write(String) 1    
    void System.String::CopyTo(int32, wchar[], int32, int32)      
  void System.IO.StreamWriter::Write(String) 1    
    void System.String::CopyTo(int32, wchar[], int32, int32)      
  void System.IO.StreamWriter::WriteLine(String) 2 32 1
    System.Char[] (32 bytes)   32  
    void System.String::CopyTo(int32, wchar[], int32, int32)      
    void System.IO.StreamWriter::Write(wchar[], int32, int32)      

7 function calls. 1 allocation. 32 bytes allocated.

Things to Note:


  • As with Option 1, formatting was avoided
  • As with Option 1, a string was created for the WriteLine call
     

This is already too long but I thought I’d end by sharing a quick summary of the nature of the responses I got internally.



  • Some thought the problem was very easy. Many of these people didn’t do very well :)
  • Everyone (including me) was suprised by the allocation in #3.
  • Some people thought all the options would go through the formatting logic, looking for { and } and so forth, that’s not the case if there is only one argument to Write/WriteLine.
  • Some people feared the three function calls they could see in Option 3, but didn’t think about the internal work assocated with formatting in #2 that they could not see. There was a great tendancy to assume WriteLine was magic or rocketscience — generally a bad idea. The three virtual function calls are really the least of your problems with #3.
  • Some people assumed no buffering (fair enough), which majorly colors the answer.
  • Some had supreme faith that we had done magic in #2 to make it perfect… alas no.
  • And lastly, Some were afraid to guess at all, must be people that read my blog :)

Comments (23)

  1. What a great post! I love this kind of information. I also find it interesting that even at Microsoft developers make the same kind of assumptions the rest of us make. You should defently make a series out of this. This is also the type of column/information that msdn.microsoft.com should carry. Thanks!

  2. I would also be curious to see the numbers if yu had used a StringBuilder to construct the string and then write it out.

  3. Rico Mariani says:

    Actually if you look at the trace for the WriteLine with formatting carefully (sorry about that small font) you’ll see that what it did was precisely what you suggest — it internally used a StringBuilder.

    In fact I’m pretty sure that the only thing that actually has general purpose formatting services in its implementation is StringBuilder so everything else that offers general purpose formatting is in fact using a StringBuilder to get the job done.

  4. JuanC says:

    An added bonus for #2, it is a lot easier to localize since #1 and #3 assume an specific string ordering.

    Great post. Even though I expected #3 to be the fastest, I would not have predicted the difference in function-calls/bytes between #1 and #2.

  5. Anonymous says:

    Tom’s log :: .NET string performance

  6. ed says:

    good article. more like this please!

  7. AndrewSeven says:

    Oopsy.

    These were my quick answers before reading the spoilers.

    Q1 : Best Perf #1 But # 3 is more correct.

    Q2 : Memory , #2 uses more

    Q3 : #3, it will remain similar as you add more and more writes.

    Q4 : The size of the strings, the size of the string builder and how much buffer it has waiting.

    Q5 : I am assuming it is a StringBuilder.

  8. [.NET]Performance Quiz #1 (of a series?)

  9. Rick Byers says:

    Can you really safely draw conclusions about "best performance" without measuring CPU usage? I know calls and bytes allocated is a good indicator of performance, but I would have expected something like "how long does it take to do it ten million times" to be a more accurate test. How do you decide when a test is accurate enough for its results to be actionable?

  10. Rico Mariani says:

    That’s an excellent question, but rather than just spout a few quick words here. I think I’d like to write a seperate blog about making and interpreting benchmarks. I’ll see if I can give you something substantial soonish.

  11. Nonsense says:

    Excellent post. I would love to see more of these.

    As an aside, aren’t the empty posts contibuting to google bombing?

  12. James Curran says:

    Interesting, but it seems to raise more questions….

    Your test uses concatenation of three strings — which is apparently treated as a special case by both the C# compiler and the CLR. What happens when one tries concatenating 4 5 or 6 strings?

    On the other hand, what would appear to be a special case optimazation with String.Format (using direct parameters rather than an array) turns out to be merely syntatic sugar covering a DE-optimization (building an array behind the scenes). So, why are you optimizing the method you tell us not to use (adding strings), instead of the method you tell us TO use (StringBuilder.Format)? Is there a point (number of parameters) where String.Format beats Concats? And, if the non-array versions of String.Format are just wrappers around the array version, why limit it to just three parameters? Why not 4, 5, 6 etc parameter versions?

  13. Sanjay says:

    What happens when you concatenate strings that are constant? Is the compiler smart enough to automatically combine into a single string immediately?

    I.e. the common case of long SQL command strings which for readability are broken up in VB using line continuation characters and ampersands.

    "SELECT …" & _

    " FROM …" & _

    " WHERE …"

  14. Rico Mariani says:

    I wasn’t there for most of this so I’m giving you my best guess answers here which could be wrong. I could say nothing but I think my answers are useful if imperfect.

    James Curran asks: "What happens when one tries concatenating 4 5 or 6 strings?"

    I believe the String.Concat optimization engages for up to 4 string components. I imagine that number was chosen because it handles the vast majority of cases we see in the field.

    James: "Using direct parameters rather than an array turns out to be merely syntatic sugar covering a DE-optimization"

    Guessing here but but I think this is a case of leaving the door open for the future. If you use varargs for all the cases you can’t do anything about it in future versions of the runtime because all the IL requires a call to the vararg version. If you have the versions there you can do something about it later if it turns out to be a performance issue.

    James: "Is there a point (number of parameters) where String.Format beats Concats?"

    I don’t think that is possible without major magic inside of String.Format and even then dubious. Ultimately the strings have to be copied at minimum and that’s what Concat is in the business of doing.

    James: "If the non-array versions of String.Format are just wrappers around the array version, why limit [them] to just three parameters?"

    There is a cost to having all that metadata around and maintenance for assorted flavors. Again I suspect that common practice was examined and some reasonable choice was made as to how many arguments to put in for future optimization needs.

    James: "Why are you optimizing the method you tell us not to use (adding strings)"

    There are many cases where the string concat optimization is useful other than the one in this example. The optimization has a good value/cost ratio.

    The question that isn’t directly asked but which is perhaps most relevant and certainly implied is, "Why recommend the slowest method?"

    The answer is basically that at best WriteLine is rarely the source of anyone’s performance problems, and even if it is, very minor and local changes can readily correct those problems. On the other hand, using the given formatting design pattern helps create the most maintainable code and facilitates later localization. In some cases, like this one, taking the simplest path really is the best choice. Even if performance is foremost in your mind, and localization/maintenance isn’t an issue, going for the simple solution here will give you more time to think about performance somewhere that your time is better leveraged.

    Sanjay asks: "What happens when you concatenate strings that are constant? Is the compiler smart enough to automatically combine into a single string immediately?"

    The answer is yes. There is no cost to splitting strings like that at compile time, the compiler puts the unified string into the IL directly. A quick test application will show you that there is only one string token and it is for the unified token.

    Rick Byers writes: "Can you really safely draw conclusions about "best performance" without measuring CPU usage?"

    Generally not, but in this case the space argument nicely predicts the time results and the blog was already too long :)

    So, ahem, for space reasons, I trimmed out the time. :)

  15. sumtec says:

    Good.

    But I think writeline("{0}:{1}", str1, str2) should be used when you might need to change the format of the output. For example, maybe one user would like the output {0}:{1}, but some others may like the output to be [#{1}]:{0}.

    In a word, writeline(str1 + ":" + str2) is so called a hardcoding, which maybe fast in coding but expansive in maintaining.

  16. Steve James says:

    Question:

    I carried out the same profiling for the same code with broadly the same results. However, the first iteration of Test1 allocated 586 bytes, the following 4 allocated 94. The first iteration of Test2 allocated 428 bytes, the following 4 allocated 184. Why does the first iteration allocate more memory? Is the CLR being clever and reusing objects allocated on the first iteration?

  17. Rico Mariani says:

    Nothing exotic, I believe what’s going on there is that there are certain string literals that get allocated the first time through the code and can subsequently be re-used.

    There may be other static class members that need initialization the first time through in the same way.