ANSWER: POP QUIZ: What’s wrong with this code – part 3


This issue is an interesting one in that there are more then one problem here that will cause high memory and even what looks to be a problem that will affect the performance of this snippet as well.

So the two main problems are that we are using XSLT.Load inside of a loop which means that we are going to create a new dynamic assembly for each loop iteration.  The other problem is string concatenation.  Instead of using the + to build our string in each loop, we should use StringBuilder to save on memory.

These two issues are documented here:

As for the other issue in this code, we declare our loop iterator outside of the for loop.  This isn’t a problem and doesn’t cause any slowness, but if we were to have used an Array and looped through the objects of the array, we could have run into the “Range Check Elimination” optimization.  Take a look here for more information on that.  Basically, this code:

ArrayList myArray1 = new ArrayList();
PopulateArray(myArray1);

for (int i = 0 ; i < myArray1.Count ; i++)
{
  // Do Stuff
}

Is faster then this is:

ArrayList myArray2 = new ArrayList();
PopulateArray(myArray2);

int iCount = myArray2.Count;
for (int i = 0 ; i < iCount ; i++)
{
  // Do Stuff
}

Because in the first case, we don’t have to check that we haven’t gone past the bounds of the array.


Comments (9)

  1. bdill says:

    StringBuilder is an optimization?  Really?  I would only consider the StringBuilder for a string that is being "built".

    Consider the referenced MSDN article.

    "sDest += sSource"

    This type of "string building" is great for StringBuilders, they will crush straight concatenation when the string is "growing".  The same goes for using string.Format, after all, it just calls StringBuilder.AppendFormat internally…

    In the pop quiz, however, a new string is created in each iteration that is not a derivative of the previous iteration.

    The loop is building, foo1.xml, foo2.xml, foo3.xml.

    A StringBuilder here is only going to hurt performance.  Why?  You’re paying the StringBuilder allocation cost with each iteration.  To compound things the StringBuilder will start with an initial capacity of 16 if you don’t specify otherwise.  What does this mean?

    Not only are you paying to allocate the StringBuilder 1000 times, it has to increase the capacity of 990 of the StringBuilders after they are instantiated.  Why’s that?  The connection string text is fifteen characters without the text from the loop.  The first ten loop iterations 0-9 add one digit to the length, but from there the StringBuilders capacity has to grow to accomodate the 17+ length of the data.

    Are we talking about crippling time and/or memory consumption, no.  But if we’re talking about optimizations StringBuilder/string.Format is probably more expensive.  I’m going to post this and then whip up some demo code.  I’ll post the code and results ASAP.

    In summary, my issue here is that the overhead of the StringBuilder negates the typical benefit of a StringBuilder since the operation is simple concatenation.

    Thanks!

  2. csdream says:

    Hi,

    I really enjoy your blog and greatly appreciate all the info I get here.

    However, I disagree with your opinion on using StringBuilder for string concatenation inside of the loop in your (previous) example.  

    As far as I know, multiple string concat in a single c# statement, such as “a” + “b” + “c” + “d”, will perform most efficiently in most cases as C# compiler team made a nice optimization.  C# compiler will produce IL code something equivalent to String.Concat(new string[] {“a”, “b”, “c”, “d”}), which will allocate exact amount of memory and perform concatenation.   In fact, using StringBuilder class inside of a loop to concatenate a fixed number of strings may cause performance degradation instead of improvement.  When a StringBuilder object is created with an insufficient buffer length (default is implementation-dependent), StringBuilder should allocate memory again to accommodate new data if necessary.  This can be happening multiple times within one iteration of the loop if the resulting string is too long.  In this case, the performance will be degraded as it keeps allocates memory.  On the other hand, if a StringBuilder object allocates too much memory initially, additional memory allocation will not happen, yet, memory footprint will be larger than it should be.  If a number of loop iterations is sufficient enough, this will increase overall application’s memory footprint and may result in poor performance.  

  3. bdill says:

    Ok, here are the results of my testing and the code I used to perform the tests.

    Averaging ten runs of each test (highly scientific, not!), release build from the command line…

    StringBuilderTest:

    Avg Memory Bytes Used – 439500

    Avg Elapsed Time – 00:00:00.0008906

    StringConcatenateTest:

    Avg Memory Bytes Used – 253952

    Avg Elapsed Time – 00:00:00.0004850

    My tests confirm my thoughts on the matter, so this probably means I’m doing something wrong and missing something obvious.  If not, string concatenation is marginally faster…

    Code

    Pasted here: http://www.nomorepasting.com/getpaste.php?pasteid=20568

    ———————-

    using System;

    namespace ConsoleApplication1

    {

    class Program

    {

    const string FILENAME_FORMAT = “NewFileName{0}.xml”;

    const string FILENAME_PREFIX = “NewFileName”;

    const string FILENAME_SUFFIX = “.xml”;

    static void Main(string[] args)

    {

    // Run each test independently

    switch (args[0])

    {

    case “1”:

    Console.WriteLine(“StringBuilderCapacityDemo”);

    StringBuilderCapacityDemo();

    break;

    case “2”:

    Console.WriteLine(“StringBuilderTest”);

    StringBuilderTest();

    break;

    case “3”:

    Console.WriteLine(“StringConcatenateTest”);

    StringConcatenateTest();

    break;

    default:

    Console.WriteLine(“No test selected.”);

    break;

    }

    }

    private static long BytesOfMemoryUsed

    {

    get

    {

    return System.Diagnostics.Process.GetCurrentProcess().WorkingSet64;

    }

    }

    private static void StringConcatenateTest()

    {

    System.Diagnostics.Stopwatch SW = new System.Diagnostics.Stopwatch();

    long StartingMemory = BytesOfMemoryUsed;

    SW.Start();

    for (int i = 0; i < 1000; i++)

    {

    string FileName = FILENAME_PREFIX + i + FILENAME_SUFFIX;

    }

    SW.Stop();

    Console.WriteLine(“Bytes Used: ” + (BytesOfMemoryUsed – StartingMemory));

    Console.WriteLine(“Elapsed Time: ” + SW.Elapsed);

    }

    private static void StringBuilderTest()

    {

    System.Diagnostics.Stopwatch SW = new System.Diagnostics.Stopwatch();

    long StartingMemory = BytesOfMemoryUsed;

    SW.Start();

    for (int i = 0; i < 1000; i++)

    {

    System.Text.StringBuilder SB = new System.Text.StringBuilder();

    SB.AppendFormat(FILENAME_FORMAT, i);

    string FileName = SB.ToString();

    }

    SW.Stop();

    Console.WriteLine(“Bytes Used: ” + (BytesOfMemoryUsed – StartingMemory));

    Console.WriteLine(“Elapsed Time: ” + SW.Elapsed);

    }

    private static void StringBuilderCapacityDemo()

    {

    System.Text.StringBuilder SB = new System.Text.StringBuilder();

    Console.WriteLine(“Initial Capacity:” + SB.Capacity);

    SB.AppendFormat(FILENAME_FORMAT, 1);

    Console.WriteLine(SB.ToString() + “:” + SB.Capacity);

    SB = new System.Text.StringBuilder();

    Console.WriteLine(“Initial Capacity:” + SB.Capacity);

    SB.AppendFormat(FILENAME_FORMAT, 10);

    Console.WriteLine(SB.ToString() + “:” + SB.Capacity);

    }

    }

    }

  4. tomchris says:

    Without looking into how you are getting the amount of memory, it is hard for me to tell exactly what is happening there.  But if you read those articles, you will see the problem.  I’ll check, but I’m pretty sure the compiler doesn’t change string concatenation calls into a single string.concat.  Also, you would run into the problem just as easily if you put each + on a new line:

    a += b;

    a += c;

    a += d;

    My guess with the code is that the + sample actually has the GC kick in and clean up some of the strings.

  5. csdream says:

    Tom,

    I just wrote the very simple program as shown below, compiled it as release build, and examined it in .NET reflector.  Please see the followings:

    1. Sample code

    using System;

    using System.Collections.Generic;

    using System.Linq;

    using System.Text;

    namespace StringConcatDemo

    {

       class Program

       {

           static string a = “This is a test”;

           static string b = “This issue is an interesting one in that there are more then one problem here that will cause high memory and even what looks to be a problem that will affect the performance of this snippet as well.”;

           static string c = “So the two main problems are that we are using XSLT.Load inside of a loop which means that we are going to create a new dynamic assembly for each loop iteration.”;

           static string d = “The other problem is string concatenation.”;

           static string e = “Instead of using the + to build our string in each loop, we should use StringBuilder to save on memory.”;

           static void Main(string[] args)

           {

               string result = a + b + c + d + e;

               string result2 = String.Concat(new string []{a, b, c, d, e});

           }

       }

    }

    2. IL for Main() shown in .NET reflector

    .method private hidebysig static void Main(string[] args) cil managed

    {

       .entrypoint

       .maxstack 3

       .locals init (

           [0] string[] CS$0$0000,

           [1] string[] CS$0$0001)

       L_0000: ldc.i4.5

       L_0001: newarr string

       L_0006: stloc.0

       L_0007: ldloc.0

       L_0008: ldc.i4.0

       L_0009: ldsfld string StringConcatDemo.Program::a

       L_000e: stelem.ref

       L_000f: ldloc.0

       L_0010: ldc.i4.1

       L_0011: ldsfld string StringConcatDemo.Program::b

       L_0016: stelem.ref

       L_0017: ldloc.0

       L_0018: ldc.i4.2

       L_0019: ldsfld string StringConcatDemo.Program::c

       L_001e: stelem.ref

       L_001f: ldloc.0

       L_0020: ldc.i4.3

       L_0021: ldsfld string StringConcatDemo.Program::d

       L_0026: stelem.ref

       L_0027: ldloc.0

       L_0028: ldc.i4.4

       L_0029: ldsfld string StringConcatDemo.Program::e

       L_002e: stelem.ref

       L_002f: ldloc.0

       L_0030: call string [mscorlib]System.String::Concat(string[])

       L_0035: pop

       L_0036: ldc.i4.5

       L_0037: newarr string

       L_003c: stloc.1

       L_003d: ldloc.1

       L_003e: ldc.i4.0

       L_003f: ldsfld string StringConcatDemo.Program::a

       L_0044: stelem.ref

       L_0045: ldloc.1

       L_0046: ldc.i4.1

       L_0047: ldsfld string StringConcatDemo.Program::b

       L_004c: stelem.ref

       L_004d: ldloc.1

       L_004e: ldc.i4.2

       L_004f: ldsfld string StringConcatDemo.Program::c

       L_0054: stelem.ref

       L_0055: ldloc.1

       L_0056: ldc.i4.3

       L_0057: ldsfld string StringConcatDemo.Program::d

       L_005c: stelem.ref

       L_005d: ldloc.1

       L_005e: ldc.i4.4

       L_005f: ldsfld string StringConcatDemo.Program::e

       L_0064: stelem.ref

       L_0065: ldloc.1

       L_0066: call string [mscorlib]System.String::Concat(string[])

       L_006b: pop

       L_006c: ret

    }

    3. C# code for Main() shown in .NET reflector

    private static void Main(string[] args)

    {

       string text1 = a + b + c + d + e;

       string text2 = a + b + c + d + e;

    }

  6. tomchris says:

    That is an interesting point.  But the main thing to keep in mind is that StringBuilder is the best way to go (even better then String.Format) for the following reason.

    If you allocate a string (using concat, format, etc) on each iteration of the loop, that is one string each time.  While a StringBuilder will just have one object for the entire loop.  So if the loop is 1000 iterations, you will create, at best, 1000 strings… maybe more.  Where the StringBuilder will create one.  If you imagine instead that each iteration created the string for a row in a table, you can see how quickly that memory would grow.

    Hope that helps clarify things.

  7. csdream says:

    Tom,

    Nice point you brought up.  However, you may have to think the following:

    In your application, you may reuse one StringBuilder object for every iteration, yet, you do need one string per iteration unless XPathDocumentation Ctor will take StringBuilder object directly (I am afraid it may not).  Therefore, at best with StringBuilder object in your previous example,  1000 strings + 1 string builder object may be created (strings are immutable in .NET once they are created with StringBuilder.ToStirng() method).  

    I already mentioned that if the StringBuilder object was initialized with small Capacity, it may require to allocate additional memory, copy the content from the original buffer to a new one.  And you may have to pay the cost of reseting/clearing StringBuilder object every iterations, which may not be negligible when the buffer size is large and a number of iteration is significant enough.  Also, the only method you may use to clear StringBuilder object is Remove() method, as StringBuilder class seems not to provide Clear() method.

    Therefore, I would not recommend to change the string concatenation line presented in your previous example to use StringBuilder object.  

    Less lines of code, better readability, and better performance.

  8. Tom, AFAIK the range check elimination optimization applies to arrays and arrays only.  When you call the ArrayList’s indexer (or List<T>’s indexer, for that matter) you don’t enjoy the optimization anyway unless the indexer is inlined and then the optimization is applied.  It’s worth checking again, but the last time I checked it didn’t happen.