Comparing RegEx.Replace, String.Replace and StringBuilder.Replace – Which has better performance?


 


A few days ago I was with Frank Taglianetti (no links here, he doesn’t have a blog yet), a PFE from my team that I met for the first time at that day while doing a Lab for one of our customers. By Lab I mean stress testing and troubleshooting a customer’s application in our laboratory.


 


At some point we were reviewing a snippet of C# code that was the culprit for the slow performance. After reviewing it we started asking ourselves what would be a better approach: String.Replace(), RegEx.Replace() or StringBuilder.Replace().


We didn’t care about case-sensitivity because the application had to replace special characters.


 


Then the fun began…


At that point, without doing any tests, we guessed about which would be the best and worst Replace() call.


 


Frank was able to come up with a theory to justify his guess for the worst performer, and it proved to be right!


 


Whenever I mention a co-worker, I like to add some words about him or her. During the Lab I was impressed with Frank (we use to call him Tag, based on his last name); he is a great developer and a great debugger, proving something I use to say, that these two skills walk together. Besides, the guy has a lot of experience! It’s always a pleasure to work with people like him because I always learn something new!


 


Back to the Lab…to find the fastest way to replace a character with another character from a large string, I decided to use PowerShell to test the three approaches.


 


Below I present the scripts and the results… They are by no means full stress tests; however, they are useful to give us a baseline when processing large text files. You may be surprised with the results. We were!


 


Note: The tests don’t consider the regular expression syntax that is part of the PowerShell language, since it cannot be reused from VB.Net or C#.


If you are curious about it, just create another function that uses –match and –imatch.


 


RegEx.Replace()


 


 


 


 


 


 


StringBuilder.Replace()


 



 


 


 


String.Replace()


 


 


 


 


 


Source code for RegEx.Replace:


 


 


#########################################################


## This is a sample test to measure the performance of StringBuilder.Replace against


## RegEx.Replace


##


## RegEx.Replace is case insensitive!


#########################################################


param(


      [string] $fileName = $(throw “Error! You must provide the method name.”)


     )


   


set-psdebugstrict


 


$ErrorActionPreference = “stop”


 


trap {“Error message: $_”}


 


write-Host “Starting RegEx.Replace…”foreground Green –background Black


 


# Attention! If you use [string] $text, the variable is not going to be a generic Object, but System.String.


# Doing that the performance improves a lot, although it was not enough to beat String.Replace.


 


# Get file content.


$str = get-Content $fileName


   


# For testing purposes, let’s repeat the operation “n” times. ((?<value>(\n)))


for($i = 0; $i -le 200; $i++)


{


    [regex]::Replace($str, “`n”, “”);


   


}


 


write-Host “End!”foreground Green –background Black


 


 


 


Source code for StringBuilder.Replace:


 


#########################################################


## This is a sample test to measure the performance of StringBuilder.Replace against


## RegEx.Replace


##


## According to MSDN: The strings to replace are checked on an ordinal basis; that is,


## the replacement is not culture-aware. If newValue is a null reference


## (Nothing in Visual Basic), all occurrences of oldValue are removed. This method is case-sensitive.


#########################################################


param(


      [string] $fileName = $(throw “Error! You must provide the method name.”)


     )


   


set-psdebugstrict


 


$ErrorActionPreference = “stop”


 


trap {“Error message: $_”}


 


write-Host “Starting StringBuilder.Replace…”foreground Green –background Black


 


$builder       = New-Object System.Text.StringBuilder


$fileContent   = New-Object System.Text.StringBuilder


$value         = “”


 


# Assign the content to our local variable.


[System.String] $str = get-Content $fileName


 


$fileContent.Append($str)


   


# For testing purposes, let’s repeat the operation “n” times.


for($i = 0; $i -le 200; $i++)


{  


    $builder = $fileContent.Replace(“`n”, “”)       


}


 


write-Host “End!”foreground Green –background Black


 


 


 


Source code for String.Replace:


 


#########################################################


## This is a sample test to measure the performance of StringBuilder.Replace against


## RegEx.Replace and String.Replace


##


## According to MSDN: The strings to replace are checked on an ordinal basis; that is,


## the replacement is not culture-aware. If newValue is a null reference


## (Nothing in Visual Basic), all occurrences of oldValue are removed. This method is case-sensitive.


#########################################################


param(


      [string] $fileName = $(throw “Error! You must provide the method name.”)


     )


   


set-psdebugstrict


 


$ErrorActionPreference = “stop”


 


trap {“Error message: $_”}


 


write-Host “Starting String.Replace…”foreground Green –background Black


 


[System.String] $builder = “”


[System.String] $fileContent = “”


$value = “”


 


# Assign the content to our local variable.


$fileContent = get-Content $fileName


   


# For testing purposes, let’s repeat the operation “n” times.


for($i = 0; $i -le 200; $i++)


{  


    $builder = $fileContent.Replace(“`n”, “”)


}


 


write-Host “End!”foreground Green –background Black


 


 


 


 


From MSDN we have:


 


http://msdn2.microsoft.com/en-us/library/aa289509.aspx


 


Here’s another article that comes to the same conclusion:


 


http://www.codeproject.com/KB/cs/StringBuilder_vs_String.aspx?fid=326464&df=90&mpp=25&noise=3&sort=Position&view=Quick&fr=26


 


Based on this simple test, RegEx.Replace() has the worst performance and the award goes to…drum roll, please… String.Replace()!


 

Comments (23)

  1. What was the size of your data set?  Did you try it over different size files?  SB should be much better for larger data sets, and I certainly wouldn’t want to use a String if the size is over 85k (large object heap!)

    What about the memory footprint of each one?  String.Replace() is certainly going to leave a bunch of junk around that will need to be gc’d later.  

    Basically, this is a worthless statistical blip that fits *just* your dataset, and shouldn’t be applied as a one-size-fits-all analysis.

  2. BlindWanderer says:

    Have you tried running it with the RegexOptions.Compiled compiled flag? You should get better performance.

  3. Cristopher, we used files from 2 mb to over 6.5 mb. During the tests we inverted the sequence of execution. We haven’t noticed anything abnormal with CPU. Like you, I was expecting to see SB surpassing the other two approaches. If you decide to try it and get different results, let me know, please.

    BlidWanderer, I haven’t tried it, but it’s definitely a good idea to try. Have you tried it? I’m curious! :)

  4. Sorry guys, but the test is wrong …

    The line $fileContent.Replace("n", "") replaces *nothing* because n is just a two character string in PowerShell.  What you MEANT to test is: $fileContent.Replace("`n", "")

    On top of that, the string "n" is a regular expression PATTERN, which means it requires pattern matching — something you CANNOT DO with either of the other two methods.  You should try with "`n" in all three cases, that way you’re just replacing a single character.

    As a side note, *removing* single characters isn’t really that great of a test anyway — you should try *replacing* a whole word with a word of a different length — maybe try replacing "Visual Basic" in that MSDN article you linked to with "VB" or something…

  5. Thanks for the correction! After testing with other special chars I made this mistake when rolling back to newline again.

    Anyway, after fixing it the proportion didn’t change. As you can see below String.Replace is the fastest approach.

    We just replaced special characters, not words or regular characters, because that was the requirement for a particular function we were investigating. I totally agree with you that it’s

    not a full stress test and I mentioned this in the article.

    Here are the results:

    RegEx

    PS C:developmentMy Tools> measure-command { .RegExReplacePerformance.ps1 test.txt }

    Starting RegEx.Replace…

    End!

    Days              : 0

    Hours             : 0

    Minutes           : 3

    Seconds           : 59  <– Here!

    Milliseconds      : 907

    Ticks             : 2399070288

    TotalDays         : 0.00277670172222222

    TotalHours        : 0.0666408413333333

    TotalMinutes      : 3.99845048

    TotalSeconds      : 239.9070288

    TotalMilliseconds : 239907.0288

    StringBuilder:

    PS C:developmentMy Tools> measure-command { .StringBuilderReplacePerformance.ps1 test.txt }

    Starting StringBuilder.Replace…

    End!

    Days              : 0

    Hours             : 0

    Minutes           : 0

    Seconds           : 34  <– Here!

    Milliseconds      : 570

    Ticks             : 345702214

    TotalDays         : 0.000400118303240741

    TotalHours        : 0.00960283927777778

    TotalMinutes      : 0.576170356666667

    TotalSeconds      : 34.5702214

    TotalMilliseconds : 34570.2214

    String:

    PS C:developmentMy Tools> measure-command { .StringReplacePerformance.ps1 test.txt }

    Starting String.Replace…

    End!

    Days              : 0

    Hours             : 0

    Minutes           : 0

    Seconds           : 20  <– Here!

    Milliseconds      : 583

    Ticks             : 205835077

    TotalDays         : 0.000238235042824074

    TotalHours        : 0.00571764102777778

    TotalMinutes      : 0.343058461666667

    TotalSeconds      : 20.5835077

    TotalMilliseconds : 20583.5077

    Thanks

  6. gOODiDEA says:

    .NET:WorkingwithEvents,part1AFast/CompactSerializationFrameworkHowtosetanIISApplica…

  7. gOODiDEA.NET says:

    .NET: Working with Events, part 1 A Fast/Compact Serialization Framework How to set an IIS Application

  8. It’s me again 😉 there’s still something wrong with that, because there’s just no way the regex takes 8 times as long…

    If the code you’re using is exactly what you pasted, the problem is that you’re passing an array of strings to [regex]::replace instead of a string. (to be fair, you should also be assigning the output to a variable, although it won’t make much difference if you use Measure-Command)

    try this:

    $lines = gc $fileName

    [string]$text = gc $fileName

    and then run your [regex]::replace on $lines and on $text … the array takes MUCH longer (depending mostly on what you replace)

    My results (using the Get-PerformanceHistory script from PowerShellCentral.com/scripts):

    Duration Average Commmand

    ——– ——- ——–

    13.01577 0.13016 1..100 | ForEach { $out1 = [regex]::replace($lines,"`n","") }

    3.21073 0.03211 1..100 | ForEach { $out2 = [regex]::replace($text,"`n","") }

    2.81232 0.02812 1..100 | ForEach { $out3 = $text.replace("`n","") }

  9. Incidentally … I think it’s clear by now that my corrections aren’t meant to disprove the basic premise — that string.replace is fastest — it obviously is.

    The only reason I bother with the correction is that the difference is very minor, not a multiple.

    My point is just to make sure it’s clear that under normal circumstances, you should be able to just use whatever format your string is already in — because the time (and memory) it takes to convert from one to the other (and back?) is too big of an offset 😉

  10. Hi Joel,

    I’m using a text file that is, in fact, a License Agreement:

    “License Agreement

    This License Agreement describes the rights and responsibilities of anyone using XXX.

    THIS IS AN AGREEMENT BETWEEN YOU AND…”

    I copied and pasted it several times to create an even bigger file, not for the real tests, but for the blog tests.

    I didn’t use [text] $var to avoid the implicit conversion to System.String, however, if I do that the time improves a lot, but doesn’t beat String.Replace. Maybe I should’ve done it, as you said, to get closer results. (I’m putting a comment in the source code)

    Anyway I’m glad to know our results are the same! :) By the way, nice blog! I see you are a PowerShell Master!

  11. Lox says:

    StringBuilder.Replace returns ‘this’, not a new instance (like String.Replace). So in this place

    $fileContent.Append($str)

    for($i = 0; $i -le 200; $i++)

    {  

       $builder = $fileContent.Replace("`n", "")        

    }

    for $i > 0, $fileContent does not contain "’n" – nothing to replace, searching only. IMHO, this fact can affect results.

    But.

    I’ve made the same things in C# (.NET 2.0), here is the code:

    static void Main(string[] args)

    {

    const int Runs = 200;

    string fileData = null;

    string result = null;

    using (StreamReader reader = new StreamReader("data.txt", Encoding.GetEncoding(1251)))

    {

    fileData = reader.ReadToEnd();

    }

    Stopwatch timer = new Stopwatch();

    /* RegEx Replace */

    for (int run = 0; run < Runs; run++)

    {

    timer.Start();

    result = Regex.Replace(fileData, "n", " ");

    timer.Stop();

    }

    Console.WriteLine("Regex.Replace – {0} ms", timer.ElapsedMilliseconds);

    timer.Reset();

    /* StringBuilder.Replace */

    StringBuilder builder = new StringBuilder();

    for (int run = 0; run < Runs; run++)

    {

    builder.Append(fileData);

    timer.Start();

    builder.Replace("n", "");

    timer.Stop();

    builder.Remove(0, builder.Length);

    }

    Console.WriteLine("StringBuilder.Replace – {0} ms", timer.ElapsedMilliseconds);

    timer.Reset();

    /* String.Replace */

    for (int run = 0; run < Runs; run++)

    {

    timer.Start();

    result = fileData.Replace("n", "");

    timer.Stop();

    }

    Console.WriteLine("String.Replace – {0} ms", timer.ElapsedMilliseconds);

    timer.Reset();

    Console.ReadKey();

    }

    I’ve got the following results:

    Regex.Replace – 3984 ms

    StringBuilder.Replace – 1691 ms

    String.Replace – 2108 ms

    Something’s wrong?

  12. Interesting… your results are different from mine, Joel and http://www.codeproject.com/KB/cs/StringBuilder_vs_String.aspx?fid=326464&df=90&mpp=25&noise=3&sort=Position&view=Quick&fr=26 that uses C#.  (I haven’t tested your code in my machine)

    I’m wonder if someone else has also tested the three approaches in PowerShell or C#/VB.NET. I’m curious to see which approach is the fastest and if there is a consistent winner.  :)

  13. Niki says:

    Did you try replacing longer strings? IIRC Regex uses Boyer-Moore for matching, which should be more efficient for longer patterns. Of course, this only affects the time for searching, not the time needed for replacing the string, so it might also depend on how often the pattern has to be replaced. And since you’re using the default locale, the results might be completely different (think factor 10!) on a machine running with a different locale.

  14. I didn’t try to replace longer strings. I haven’t investigated the locale, but thanks for pointing it. I’m wondering if the locale should affect the results, not the raw numbers from tests, but which is the fastest approach. My guess (and this is just a wild guess :) ) is that String.Replace should continue to be the fastest approach, because all times would be equally impacted after changing the locale.

  15. Mark Bussey says:

    Have you run the CLR profiler on these three methods?  String objects seem to hang around, clogging up the works, much longer that StringBuilder objects.

  16. No, I haven’t. Did you get different results?

  17. As much as I *love* regex…the situation presented doesn’t speak to why one would even think of using regex for that work. Replacing a character is strictly the realm of string/stringbuilder replace and optimized accordingly.

    Regex is meant for complicated patterns and has overhead to that end. But the community slams regex due to that overhead. My point is this article as read by the blogsphere, reads that the implication that regex is slow for everything. Factually in this test its slower but there are issues not being considered; hence regex gets painted with a broad stroke.

    If one presented a scenario where the replacement need was really a complex situation. I would bet that regex could easily speak to that and provide a good showing. For the *added overhead* of the support or multiple calls needed to use string/stringbuilder replace would begin to slow down and show regex pulling ahead.

    Long story short; use the right tool for the right situation. Regex is a truck and used to haul large quantities and string/stringbuilder are sport cars quick but not meant to carry large loads.

    Thanks for the article though…it is interesting.

  18. talav says:

    Regex.Replace –  ms1148

    StringBuilder.Replace –  ms248

    String.Replace –  ms263

    Same results as Lox

  19. Mark Fuini says:

    Did you initialize the StringBuilder with length  * 2?

  20. Remith R says:

    I totally agree with rafarah that the String.Replace() has the best performance among the String.Replace(), StringBuilder.Replace() and Regex.Replace().

    String is a very LIGHT weight compared to HEAVY weight classes StringBuilder and Regex. Much of the time in the heavy weight classes are spent in the instantiating the new object and then building the resultant string after replacement operations.

    Below is the Proof of Concept in C# to support the above fact.

    using System;

    using System.Text;

    using System.Text.RegularExpressions;

    namespace ReplacePOC

    {

       public class Utils

       {

           public static string ReplaceSpecialCharactersWithString(string stringWithSpecialCharacters)

           {

               return string.IsNullOrEmpty(stringWithSpecialCharacters)

                           ? string.Empty

                           : (((stringWithSpecialCharacters.Replace(Environment.NewLine, Program.SingleSpace))

                           .Replace(Program.LineFeed, Program.SingleSpace))

                           .Replace(Program.CarriageReturn, Program.SingleSpace))

                           .Replace(Program.TabCharacter, Program.SingleSpace);

           }

           public static string ReplaceSpecialCharactersWithStringBuilder(string stringWithSpecialCharacters)

           {

               if (string.IsNullOrEmpty(stringWithSpecialCharacters))

               {

                   return string.Empty;

               }

               StringBuilder replaceBuilder = new StringBuilder(stringWithSpecialCharacters, stringWithSpecialCharacters.Length);

               replaceBuilder.Replace(Environment.NewLine, Program.SingleSpace);

               replaceBuilder.Replace(Program.LineFeed, Program.SingleSpace);

               replaceBuilder.Replace(Program.CarriageReturn, Program.SingleSpace);

               replaceBuilder.Replace(Program.TabCharacter, Program.SingleSpace);

               return replaceBuilder.ToString();

           }

           public static string ReplaceSpecialCharactersWithRegEx(string stringWithSpecialCharacters)

           {

               if (string.IsNullOrEmpty(stringWithSpecialCharacters))

               {

                   return string.Empty;

               }

               return Regex.Replace(

                          Regex.Replace(

                               Regex.Replace(Regex.Replace(stringWithSpecialCharacters, Environment.NewLine, Program.SingleSpace),

                               Program.LineFeed, Program.SingleSpace),

                          Program.CarriageReturn, Program.SingleSpace),

                      Program.TabCharacter, Program.SingleSpace);

           }

       }

    }

    Finally the results and String.Replace() WINS!!!!!!!!!!!!!!!

    C:WorkPOCReplacePOC>ReplacePOC.exe C:WorkPOCReplacePOCTestFile.txt

    Test File: C:WorkPOCReplacePOCTestFile.txt.

    Total items to test: 65536.

    Total TimeTaken for ReplaceSpecialCharactersWithString in Mills: 194.8743.

    Total TimeTaken for ReplaceSpecialCharactersWithStringBuilder in Mills: 301.6635.

    Total TimeTaken for ReplaceSpecialCharactersWithRegEx in Mills: 1009.9984.

    Press Enter to exit.

  21. Remith R says:

    Here is the TEST code for the above results.

    using System;

    using System.Text;

    using System.IO;

    using System.Diagnostics;

    namespace ReplacePOC

    {

       class Program

       {

           public const string CarriageReturn = "r";

           public const string LineFeed = "n";

           public const string TabCharacter = "t";

           public const string SingleSpace = " ";

           static void Main(string[] args)

           {

               string testFile = string.Empty;

               if (args.Length <= 0)

               {

                   Console.WriteLine("Enter a valid test file as argument.");

                   Console.ReadLine();

                   return;

               }

               testFile = args[0];

               if(!File.Exists(testFile))

               {

                   Console.WriteLine("Invalid test file.");

                   Console.ReadLine();

                   return;

               }

               Console.WriteLine(string.Format("Test File: {0}.", testFile));

               var fullFileLines = File.ReadAllLines(testFile);//File.ReadAllText(testFile).Split(new char[] { '|' });

               Console.WriteLine(string.Format("Total items to test: {0}.", fullFileLines.Length));

               ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

               var stopWatchString = Stopwatch.StartNew();

               foreach (var fileLine in fullFileLines)

               {

                   string replacedFileLine = Utils.ReplaceSpecialCharactersWithString(fileLine);

               }

               stopWatchString.Stop();

               Console.WriteLine(string.Format("Total TimeTaken for ReplaceSpecialCharactersWithString in Mills: {0}.", stopWatchString.Elapsed.TotalMilliseconds));

               ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

               var stopWatchStringBuilder = Stopwatch.StartNew();

               foreach (var fileLine in fullFileLines)

               {

                   string replacedFileLine = Utils.ReplaceSpecialCharactersWithStringBuilder(fileLine);

               }

               stopWatchStringBuilder.Stop();

               Console.WriteLine(string.Format("Total TimeTaken for ReplaceSpecialCharactersWithStringBuilder in Mills: {0}.", stopWatchStringBuilder.Elapsed.TotalMilliseconds));

               ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

               var stopWatchRegEx = Stopwatch.StartNew();

               foreach (var fileLine in fullFileLines)

               {

                   string replacedFileLine = Utils.ReplaceSpecialCharactersWithRegEx(fileLine);

               }

               stopWatchRegEx.Stop();

               Console.WriteLine(string.Format("Total TimeTaken for ReplaceSpecialCharactersWithRegEx in Mills: {0}.", stopWatchRegEx.Elapsed.TotalMilliseconds));

               Console.WriteLine("Press Enter to exit.");

               Console.ReadLine();

           }

       }

    }

  22. Anon says:

    I may be late to this conversation, but this issue may have to do with how the CLR handles Regex in 64-bit.

    MS has themselves admitted that XSLT & Regex utilizes 4x more memory than it did in 32-bit.

    connect.microsoft.com/…/508748

    Its been an issue for 4+ years and still not resolved in 4.5 dev preview.