.NET Regular Expressions: how to use RegexOptions.IgnorePatternWhitspace [Ryan Byington]

The IgnorePatternWhitespace option tells the Regex parser to ignore any spaces or tabs in your expression except if it is in a character class(ie [ ]). At first this may not seem all that useful but it really can increase the readability of a regular expression. Plus you can add comments to your expression.

For example a customer complained that the following regular expression that is suppose to match email addresses was taking to long:

"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$"

Maybe I am just not very good at visualizing regular expressions but I have trouble seeing what this regular expression does when it is in this format. The first thing I do is convert it to a format that I can read like the following:

@"

^

(

            [0-9a-zA-Z] #Verify the email address starts with a valid character

            (

                        [-.\w]*

[0-9a-zA-Z]

            )*

            @

            (

                        [0-9a-zA-Z]

[-\w]*

[0-9a-zA-Z]

\.

            )+

            [a-zA-Z]{2,9} #Match to com, org, uk, etc

)

$"

This equivalent to the first expression as long as the RegexOptions.IgnorePatternWhitespace option is used. The only trick here is that to match a space you must either use [ ] or \x20. If you are curious what was wrong with this expression the ‘.’ in [-.\w]* needed to be escaped.

It is possible to write an entire c# program on a single line with no indenting but no one in their right mind would do this because it would be completely unreadable. So why do this with your regular expressions?