Regex hangs with my expression [David Gutierrez]

Well, actually, it doesn’t hang.  It just takes a really really long time, and you haven’t waited long enough for it to finish.  One of the pitfalls with regular expression is that you can write expressions which don’t perform very well.  In particular, you can end up with expressions whose search time grows exponentially with the length of the search string.  I get bugs reporting that Regex hangs about once a month, and it always turns out to be an exponentially slow expression.  Here’s a simplified example of one of them:


There are two things interesting about this expression.  First, notice that it has two quantifiers nested within each other.  The inner one is the + quantifier for the character class, and the outer one is the *.  Second, it has a character (the equals char) that must be matched at the end of the result.  In English terms, this expression can be explained as

  1. match any character a-z, one or more times
  2. match step #1 zero or more times
  3. match an equals

What will happen is that Regex will breeze through step 1 and 2 only to find that it can’t match in step 3.  That forces it to backtrack and try to match the first two steps differently.  The trouble is that there are a lot of different ways that steps 1 and 2 can match, and Regex needs to try every single one before it can determine that the expression does not match the string.  Here’s some sample code to demonstrate it (if you’re not using Whidbey, you’ll need to change from using Stopwatch to DateTime.Now):

using System;
using System.Text;
using System.Text.RegularExpressions;

namespace Repro_RegExMatchHang {
    class Program {
        static void Main(string[] args) {
            Regex slowRegex = new Regex("([a-z]+)*=");
            string searchString = @"klhsaflkasdasdasdasdasdadasdasdasdasdasdadad";
            for (int i=1; i<searchString.Length; i++) {
                System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();

                Match m = slowRegex.Match(searchString.Substring(0, i));

                Console.WriteLine(i + "\t" + sw.Elapsed);

This code loops over the search search string and passes successively longer strings to Regex.Match.  With each iteration, the match time takes twice as long.   Here’s some of the output for a string of length 10 through 14:

10      00:00:00.0010704
11      00:00:00.0021082
12      00:00:00.0040549
13      00:00:00.0082593
14      00:00:00.0161723


Comments (9)

  1. Anonymous says:

    Seems like a lack of optimization to me. If the regexp engine knows that it needs to find an unqualified = it shouldn’t have to loop at all when it doesn’t find one.

  2. Anonymous says:

    What exactly were you trying to match with this?

  3. Anonymous says:


    What makes you think it is looking for an unqualified =? It is looking for a= or aa= or aaa= …

  4. Anonymous says:

    Writing regular expressions is like writing any code. You can write fast code or slow code, just like you can write fast regexes and slow regexes. If you’re trying to sort a million random integers with a bubble sort, it’s not the fault of the compiler’s optimizer that it didn’t change it to something more reasonable.

    Sure, in this case it’s probably not that hard to figure out that [a-z]+= is just as good, but this is just a (rather contrived, I guess) example.

  5. Anonymous says:

    Add the Compiled Switch to see if that helps

  6. Anonymous says:

    Actually it should be [a-z]*=

    Notice that [a-z]+ is surrounded by a parens then * meaning that the whole of [a-z]+ can appear 0 or more times.

  7. Sometimes you hear and accept advice but you don’t really know the details behind it. That has always