Using StringBuilder to improve performance of your apps


TestSuites.cs file generated by my model, and parse it out into multiple .cs files each representing one testcase.  This sounded great to me, because my TestSuites.cs file was 113Mb, and splitting it out into around 300 testcase files sounded good to me.
 
However, when I ran the tool for about 3 hours without it completing, I began to wonder what was going on.  From debugging it with a friend of mine, David Owens, we discovered that the main reason it took so long was because the original tool author was using normal string concatenation during the parsing.  Here is some sample code which illustrates the problem.

static void Main(string[] args) {
string contents = "";
DateTime dtStart = DateTime.Now;

StreamReader sr = new StreamReader(@"C:\TestSuites.cs");
while (!sr.EndOfStream) {
contents += sr.ReadLine(); // Don’t do this =)
}
DateTime dtEnd = DateTime.Now;
TimeSpan ts = dtEnd.Subtract(dtStart);
Console.WriteLine(contents + "\r\n" + ts.Seconds.ToString());
}



If you’ve written code like this before, and you’re familiar with some of the static analysis tools like FxCop, you’ve probably been informed that this is bad.  The offending line is “contents += sr.ReadLine();”.  The root of the bug is that strings in .NET are “immutable.”  Basically, what that means is, whenever you assign a new string to a variable, the old string gets marked for garbage collection, and new memory is allocated to house the new string.  In this case, the code file is about 1.64 million lines, and memory allocation is bigger than the last.  Based on an average line length, it would take approximately 87.3 terra-bytes of memory to store all the strings allocated in this example.  Now you see why my laptop was thrashing for over 3 hours trying to parse this file.  Time for Microsoft to upgrade my laptop, IMO =).
 
Enter, the System.Text.StringBuilder class.  The idea behind this class is to allocate a fixed chunk of memory to build the string, and only allocate a new buffer if the string you're building becomes too long for the original allocation.  Once you're ready to use it, it allocates an immutable string object of the correct size.  In this example, the fix is to use the Append() method to build up my uber string, like so:

static void Main(string[] args) {
StringBuilder contents = new StringBuilder("");
DateTime dtStart = DateTime.Now;

StreamReader sr = new StreamReader(@"C:\TestSuites.cs");
while (!sr.EndOfStream) {
contents.Append(sr.ReadLine());
}

DateTime dtEnd = DateTime.Now;
TimeSpan ts = dtEnd.Subtract(dtStart);
Console.WriteLine(contents + "\r\n" + ts.Seconds.ToString());
}


Now, each time I read a new line from the file, rather than allocating a new immutable string to store the current contents, the StringBuilder class just stores a pointer to that substring in memory until the last line of my Main() function where it outputs the string to the console window.  It takes quite a bit longer to output the 113Mb of my file to the console than it does to execute the while loop.  But the perf improvement is amazing.  My while loop now executes in 24 seconds instead of several hours!!
 
Fun stuff,
Bri


Comments (2)
  1. Anonymous says:

    What about SreamReader.ReadToEnd?

  2. brianmcm says:

    Absolutely, that would speed up my sample code, faster than either of the two implementations.  However, I’ve greatly simplified the code of this tool to illustrate the bug.  As I mentioned in the blog post, the actual tool is taking this 113Mb file, and splitting it up into ~300 testcases, each around 5500 lines.  Therefore, the ReadToEnd method is not feasible.  If my sample code better-emulated what the tool is doing, it would change the math on how much total memory is getting allocated to be smaller, but I’d still need a better laptop =)

    -Bri

Comments are closed.

Skip to main content