Regex perf bug

While trying to get a WordML -> HTML conversion that would work reasonably well with .TEXT, I came across an interesting performance bug in the Regex class of .NET.

Run the following code and see what happens. It doesn't produce any output; just step over each line in the debugger and note how long the "slow" search takes:

 

using System.Text.RegularExpressions;

 

class Class1

{

static void Main()

  {

    string s = "fred" + new string(' ', 5000);

 

    Regex slow = new Regex(".*fred", RegexOptions.None);

    Regex fast = new Regex("^.*fred", RegexOptions.None);

 

    fast.Replace(s, "");

    slow.Replace(s, "");

  }

}

In theory the two expressions should be the same -- "anything followed by fred" is semantically the same as "anything from the start of the string followed by fred" -- but for some reason they behave very differently. I'll see if it's a known bug tomorrow (having trouble accessing the database from home).