Regex perf bug
While trying to get a WordML -> HTML conversion that would work reasonably well with .TEXT, I came across an interesting performance bug in the Regex class of .NET.
Run the following code and see what happens. It doesn't produce any output; just step over each line in the debugger and note how long the "slow" search takes:
using System.Text.RegularExpressions;
class Class1
{
static void Main()
{
string s = "fred" + new string(' ', 5000);
Regex slow = new Regex(".*fred", RegexOptions.None);
Regex fast = new Regex("^.*fred", RegexOptions.None);
fast.Replace(s, "");
slow.Replace(s, "");
}
}
In theory the two expressions should be the same -- "anything followed by fred" is semantically the same as "anything from the start of the string followed by fred" -- but for some reason they behave very differently. I'll see if it's a known bug tomorrow (having trouble accessing the database from home).