Performance Analysis Reveals Char[] Array is Better than StringBuilder

Anil Chintala here...

I told you in my previous blog about AntiXSS Output Encoding methodology and why I think it is better than .NET framework's encoding methods in preventing XSS vulnerabilities. Although, AntiXSS is superior in restricting XSS vulnerabilities, one of the main concern we hear from developers is that performance of AntiXSS's HTMLEncode() method is lower when compared to HTMLUtility.HTMLEncode() method. Optimizing the AntiXSS library functions is high on our agenda for the next generation version and we are committed to make the library perform at optimal level.

When I started looking at the optimization techniques for string manipulations, first thing came to mind is StringBuilder class which is more efficient than String because it does contain a mutable string buffer. In some contexts, using a char [] array would be better match instead.

StringBuilder Class

Firstly, let's look at the remarks from the MSDN documentation about StringBuilder:

The StringBuilder class represents a mutable string of characters. StringBuilder is said to be mutable because it can be modified once it has been created by appending, removing, replacing, or inserting characters. A StringBuilder object maintains a buffer to accommodate the concatenation of new data. New data is appended to the end of the buffer if room is available; otherwise, a new, larger buffer is allocated, data from the original buffer is copied to the new buffer, then the new data is appended to the new buffer.

Char [] Array

Although, StringBuilder is fast and has many features. In some contexts using char[] arrays is far more efficient if the maximum size of the strings are known and if the string manipulation requirements are simple. Following are the remarks from the MSDN documentation about the char structure:

The .NET Framework uses the Char structure to represent a Unicode character. The Unicode Standard identifies each Unicode character with a unique 21-bit scalar number called a code point, and defines the UTF-16 encoding form that specifies how a code point is encoded into a sequence of one or more 16-bit values. Each 16-bit value ranges from hexadecimal 0x0000 through 0xFFFF and is stored in a Char structure. The value of a Char object is its 16-bit numeric (ordinal) value.

Performance Analysis

When it comes to performance, I generally don't believe anyone unless supported by data. So, I’ve created a performance test bed and conducted a number of simple tests to analyze the performance of both StringBuilder and Char Array. Here, take a look at my test bed sample code.

I have passed a string "abcdefghijklmnopqrstuvwxyz1234567890!#$%^&*()_-=+[]{};':,<>? " through a filtering logic which filters out non-alphanumeric characters from the string first using StringBuilder and then using char array.

Code using StringBuilder,

    1:         /// <summary>
    2:         /// Example method of how we can copy and build up a string from char values
    3:         /// with a StringBuilder. This is the slow version.
    4:         /// </summary>
    5:         public static string SampleStringBuilder(string text)
    6:         {
    7:             // Declares new string builder and filters the input to allow alphanumeric string
    8:             StringBuilder builder = new StringBuilder("", text.Length);
    9:  
   10:             foreach (char c in text)
   11:             {
   12:                 if ((((uint)c > 96) && ((uint)c < 123)) ||    // "a-z"
   13:                     (((uint)c > 64) && ((uint)c < 91)) ||    // "A-Z"
   14:                     (((uint)c > 47) && ((uint)c < 58))      // "0-9"
   15:                     )
   16:                 {
   17:                     // add alphanumeric character to builder
   18:                     builder.Append(c);
   19:                 }
   20:             }
   21:             return builder.ToString();
   22:         }

Now code using Char array,

    1:         /// <summary>
    2:         /// Method showing how we can copy to an array and then return
    3:         /// a new string with its constructor. This runs faster!!
    4:         /// </summary>
    5:         public static string SampleCharArray(string text)
    6:         {
    7:             // Use a new char array.
    8:             char[] buffer = new char[text.Length];
    9:             int lenIndex = 0;
   10:  
   11:             foreach (char c in text)
   12:             {
   13:                 // Check for alphanumeric character.
   14:                 if ((((uint)c > 96) && ((uint)c < 123)) ||    // "a-z"
   15:                     (((uint)c > 64) && ((uint)c < 91)) ||    // "A-Z"
   16:                     (((uint)c > 47) && ((uint)c < 58))      // "0-9"
   17:                     )
   18:                 {
   19:                     // add alphanumeric character to char array
   20:                     buffer[lenIndex++] = c;
   21:                 }
   22:  
   23:             }
   24:             return new string(buffer);
   25:         }

I've iterated the function calls by 10000 times to measure the execution times for both scenarios. Have a look at the results.

perf

Above tests clearly shows that the code using char array is more than 2 times faster in this case. As I said earlier char arrays prove to be much faster when you are working with simple string manipulation requirements and when you know the maximum size of the string.

That's it for now, I hope this helps to make your applications perform better... Happy coding!