Optimizing C# String Performance


Strings in C# are highly optimized but also potentially very wasteful. They give programmers a safe, fast way to handle character data. However, there are a few tricks you need to know about strings and memory if you want to write efficient code. Without this information, you could easily write code that squanders both memory, and computer clock cycles.

Sharing Memory

To understand C# strings, you need to understand the answer to one fairly simple question. Suppose you have two string variables called MyString1 and MyString2. How can you get them both to point at the same place in memory? The goal here is not just to have two strings that contain the same value, but to have two string variables that reference a single block of memory that contains a string.

It turns out that the answer to this question is very simple and intuitive. The reasons behind the answer, however, are less obvious. Understand those reasons will give you the power to write code that is fast and efficient.

This post emerged from a thread on the C# forum. As often happens, I learned something in the course of the discussion. I’ve attempted to repackage that information and present it here in this post. The post begins with a look at Strings and StringBuilders, but the focus quickly switches to an exploration of how the String class handles memory.

Strings vs. StringBuilders

C# Strings are immutable. This means you can’t modify an existing string. If you try to change it with the concatenation operator, or with the Replace, Insert, PadLeft, PadRight, or SubString methods then you end up with an entirely new string. You can’t ever change an existing string. The operations you perform on a String frequently cause a new allocation of memory.

Allocations of memory are costly in terms of both memory and performance. As a result, there are times when you don’t want to use the String class.

Developers who want to work with a single string and take it through an arbitrary number of changes in a loop can use the StringBuilder class. The StringBuilder class has many of the same methods as the String class. You can, however, change the contents of a StringBuilder class without having to allocate new memory. This means that in certain situations the StringBuilder class will be much faster than the String class. In other situations, however, the opposite will be true.

What’s a developer to do? The String class is highly optimized and very efficient in most cases. However, if you need to modify a string then the String class tends to be a bit wasteful of resources. How concerned should developers be about this problem? How often should they abandon the String class and use StringBuilder? The answer, as it turns out, is “not very often.”

You should only use the StringBuilder class if you need to modify a single string many times in a loop, or in a relatively small section of code. To fully understand why this is the case, you need to understand just how smart the String class can be when it comes to handling memory in typical programming scenarios.

What Makes a C# String Sharp?

The big win for Strings is the tricks they perform to limit unnecessary memory allocations. Look at this code:


   1:  using System;
   2:  using System.Collections.Generic;
   3:  using System.Text;
   4:   
   5:  namespace CSharpConsoleApplication3
   6:  {
   7:      class Program
   8:      {
   9:          static void Main(string[] args)
  10:          {
  11:              String foo = "foo data";
  12:              String bar = foo;
  13:              Console.WriteLine(ReferenceEquals(foo, bar));
  14:              Console.WriteLine(foo.Equals(bar));
  15:              foo = "a";
  16:              Console.WriteLine(foo.Equals(bar));
  17:              Console.WriteLine(ReferenceEquals(foo, bar));
  18:              String goober1 = "foo";
  19:              String goober2 = "foo";
  20:              Console.WriteLine(ReferenceEquals(goober1, goober2));
  21:          }
  22:      }
  23:  }

The goal of getting two string variables to reference the same memory is achieved in lines 11 – 12. In this case, both foo and bar point at the same place in memory. To check, call the ReferenceEquals method (or the == operator). In this code, the call to Reference Equals returns True in line 13. We can also call the Equals method (line 14) of the String class to see that the two strings are equal in that they both have the same value. That is, they both point at the eight letters that spell “foo data”.

Now change the value of foo, as we do in line 15. A C/C++ programmer might then expect that both foo and bar would still reference the same memory, and hence both have the value “a”. This is not the case. Lines 16 and 17 both return False. The assignment of “a” to foo broke the connection between the two variables. Intuitively, this is what we would expect. It’s only our “deeper understanding” of computer languages that make us see this as odd.

The final twist in this saga is that line 20 also returns True. Here we have assigned two different strings to two different variables. Our expectation is that these two variables should not point at the same place in memory. But line 20 shows that they do reference the same block of memory.

C# maintains something called an “intern table.” This is a list of strings that are currently referenced. If a new string is created with code like that shown in lines 18 and 19, then the intern table is checked. If your string is already in there, then both variables will point at the same block of memory maintained by the intern table. The string is not duplicated. Again, this is intuitively what we want, but our understanding of computers makes us think that this is not what will happen. C# tries to conform to what we would intuitively expect to happen, not to what we think a computer is likely to do.

Some of the details of the intern table are discussed in this reference to the String Intern method.

Summary

This post explains a little bit about how C# handles memory allocations for the String class. Knowing this information is helpful if you want to write optimized code. It is also interesting information that intrigues us in part because it explains one small corner of the great wonder that is the C# language.

How important is it that one understands this information? That depends. For some people, it will be information they use every day. For others, it is just background noise. Writing safe, error free code is my most important task. Once that is accomplished, then I like to find time to work on optimization issues like those outlined here.

Comments (12)

  1. Nitin says:

    Interesting new information!

    Thanks.

  2. You’ve been kicked (a good thing) – Trackback from DotNetKicks.com

  3. This is index to the various technical posts I have created for this blog. As I add to each section,

  4. Helpware says:

    One of the most important lessons to learn in C#. Actually there are parallels here with Delphi aren’t there? C# String managment =similar= Delphi Huge String managment. C# StringBuilder managment =simpilar= Delphi strings[256]. ? Rob

  5. ccalvert says:

    Rob,

    Though we can’t always be sure who made what decisions when, nevertheless Ander’s fingerprints are present in both Delphi and C#, and so it is no surprise that we keep running across interesting parallels between the two languages. Delphi made strings so easy to use that it was difficult to improve on the system, but I think C# keeps that simplicity while adding a few twists that significantly improve usability. In particular, I find it nice that C# strings are a real class, while in Delphi they are just a glorified primitive type. But this is not really a put down of Delphi. Everything in Delphi for Win32 is elegant and highly performant.

  6. Helpware says:

    Agreed. Thanks for the reply.

  7. Anon says:

    Sorry your explanations are not really that good. All you’ve pointed out is that the compiler is really good and optimizing the code you’ve written. Well you did mention interning strings ….

    You don’t touch on the fact that strings are immutable nor what the performance gotcha’s are, various scenarios, and the canonical way to deal with them.

    So where’s the optimisation?

    Just because the compiler is smarter than you doesn’t mean when you’re writing code you should be lazy – it’s better to do it the correct way.

    e.g. The compiler, when seeing hapless coder write:

    string result = "value=" + x + ", and other= " + y + "."

    is optimisied by the compiler to

    string result = new StringBuilder().Append("value=").Append(x).Append(", and other=").Append(y).Append(".").ToString();

    So it tries to save you from  5 separate allocations in that line, but it can’t optimise it completely especially if there are more concatenations from one line to the next.

  8. David Cumps says:

    Some additional information about strings and memory usage which you might find interesting: http://blog.cumps.be/string-concatenation-vs-memory-allocation/

    It seems strings are quite a complex topic after all 😉

  9. ccalvert says:

    Anon,

    Thanks for showing an example of using the StringBuilder class. Though I did mention both it and the immutability of strings, my post probably would have been stronger had I included an example of how to use StringBuilder.

    David,

    Thank you for your reference to your excellent post on String Allocation vs Memory Allocation. For people who want to dig further beneath the surface this is a great reference.

    I also should have pointed out that performance problems in applications rarely stem from code that performs string concatenation or allocation. As a rule, it is best to look elsewhere if you find that your application is performing poorly. This is not to say that we cannot optimize string performance, only that it is usually not an important bottleneck.

    – Charlie

  10. Washier says:

    Charlie,

    I thought your explanations were that good. I think Anon failed to see that your article intentionally left readers to explore the subject a bit further and apply it in their own little world – we all write very different software.

    The "hapless" coder writing string = string + string type code is most likely working on software not requiring super-performance. This poor coder is also working on a tight schedule, maybe stressed out. This coder doesn’t have time, and should not be spending time on optimizing code that in most cases turns to be a very small part of the whole.

    In my case performance is very important, on the back-end that is. And I will go and check my code again after I reading Charlie’s work.

    This is what makes the language great. It relies on the user’s intuition and talent rather than knowledge(which comes with time anyway).

  11. Ricky Shrestha says:

    Thanks for the post. This was just the information I needed and with the help of this post (by changing string to stringbuilder) I got more than 120 folds performance increase. All I was doing was going round a loop concatenating some string. Now I use StringBuilder. Thanks for the post.