Knowing when NOT to use RegEx to match Strings (System.Text.RegularExpressions) [Kit George]

We recently got asked this question by a customer: "In C#, how do I ensure that a string entered into a text box is of the format: letter,number,letter,number,letter,number ?"

The first answer seems to be pretty straightforward: use RegEx! Regular Expressions are a pretty powerful mechanism for matching strings, and seem the obvious choice. However, you've always got to remeber that RegEx, while powerful, is also a pretty hefty mechanism for String matching. When you're looking for complex strings it's often a good choice (since writing the code yourself can be unbelevably tricky), but when what you're looking for is pretty simple (as in this case), then doing your own matching shouldn't be too tough, and is going to perform a lot more solidly.

Here's a test I wrote to demonstrate this clearly. I've included two forms of RegEx matching. The first (the single line call, looking for whitespace and the specified pattern) is showing how robust RegEx can get, but check the numbers below: the operation is suitably expensive. The second helps the RegEx out a little by doing some of the more exhaustive searching work for it (in this case, a simple trim, followed by a length check). It therefore, doesn't need to do the space matching itself. The final form is simply doing the string match yourself. In this case, I use Cahr.IsLetter and Char.IsDigit to simply step through and ensure the string is in the right format.

The results bear out that, when you can write the simple test for the pattern you want yourself, it's going to be worth doing that. Even with an optimization, RegEx is not as performant as writing a simple check yourself. When the pattern checking is omre complex, then RegEx can be far more usable.

Duration for StringMatch test 1798092 Ticks
Duration for RegExMatch1 test (no optimization) 31166928 Ticks (18x slower!!!)
Duration for RegExMatch2 test (optimization) 18380496 Ticks (10x slower)

using System;
using System.Text.RegularExpressions;
using System.IO;

class Test
{
static void Main(string[] args)
{
// include your own test cases if needed
string[] testCases = {"l2j1a9", " l2j1a9 ", "l2j1a9 ",
"a l2j1a9 ", " l2j1a9 a", "lwj1a9", " l2 j1a9 "};
long duration = 0;

    Console.WriteLine("Standard Test:");
foreach(string test in testCases) {
duration += RunTest(test, false);
}
Console.WriteLine("Duration of tests = {0}\r\n", duration);

    duration = 0;
Console.WriteLine("\r\nRegex Test:");
foreach(string test in testCases) {
duration += RunTest(test, true);
}
Console.WriteLine("Duration of tests = {0}", duration);
}

  static long RunTest(string s, bool useRegex) {
bool result = false;

    // this is just a timing run, for interest
long startTime = DateTime.Now.Ticks;
if (useRegex) {
for (int i=0;i<20000;i++) {
RegexMatch(s);
}
} else {
for (int i=0;i<20000;i++) {
StringMatch(s);
}
}
long duration = DateTime.Now.Ticks - startTime;

    result = (useRegex ? RegexMatch(s) : StringMatch(s) );

    if (result) {
Console.WriteLine("String '{0}' matches the requirement", s);
} else {
Console.WriteLine("String '{0}' does NOT match the requirement", s);
}

    return duration;
}

  static bool RegexMatch(string s) {
// Not optimized version. Read as:
// any (*) amount of space (\s) at the beginining (^)
// any word character (\w), any digit (\d), word, digit, word, digit
// any (*) amount of space (\s) at the end ($)
//Regex r = new Regex(@"(^\s*)\w\d\w\d\w\d(\s*$)");

    // Optimized version. We take out ONLY the checks for the whitepace
// Try this instead of the above line, and see what the perf difference is
/*
Regex r = new Regex(@"\w\d\w\d\w\d");

      s = s.Trim();
if (s.Trim().Length != 6)
return false;

*/
return r.IsMatch(s);
}

  static bool StringMatch(string s) {
if (s.Trim().Length != 6)
return false;

    char[] chars = s.Trim().ToCharArray();

    return (Char.IsLetter(chars[0]) && Char.IsDigit(chars[1]) &&
Char.IsLetter(chars[2]) && Char.IsDigit(chars[3]) &&
Char.IsLetter(chars[4]) && Char.IsDigit(chars[5]));
}
}