Regex perf bug

Article
01/05/2004

While trying to get a WordML -> HTML conversion that would work reasonably well with .TEXT, I came across an interesting performance bug in the Regex class of .NET.

Run the following code and see what happens. It doesn't produce any output; just step over each line in the debugger and note how long the "slow" search takes:

using System.Text.RegularExpressions;

class Class1

{

static void Main()

{

string s = "fred" + new string(' ', 5000);

Regex slow = new Regex(".*fred", RegexOptions.None);

Regex fast = new Regex("^.*fred", RegexOptions.None);

fast.Replace(s, "");

slow.Replace(s, "");

}

In theory the two expressions should be the same -- "anything followed by fred" is semantically the same as "anything from the start of the string followed by fred" -- but for some reason they behave very differently. I'll see if it's a known bug tomorrow (having trouble accessing the database from home).

Regex perf bug

Additional resources