Regular Expression performance [David Gutierrez]

I often get questions about Regex and what the RegexOptions.Compiled flag actually does. There are in fact three different modes that Regex can work in: interpreted (without the compiled flag), compiled on the fly (with the compiled flag), and precompiled.  Each of these modes has its own trade offs in performance - I'm mainly talking about startup performance, which is the initial cost of creating your Regex, and runtime performance, which is the cost of running matches.

Interpreted

This one is what you get by default when you don't pass in RegexOptions.Compiled as an option.  Here are some interpreted usages of Regex:

 r = new Regex("abc*"); Regex.Match("1234bar", "(\d*)bar"); 

We parse your expression into a set of custom opcodes, and then use an interpreter to run the expression later.  The cost of creating the Regex is low, but this mode also has the lowest runtime performance of the three.

Compiled on the fly

In this case you've passed in RegexOptions.Compiled:

 r = new Regex("abc*", RegexOptions.Compiled); Regex.Match("1234bar", "(\d*)bar", RegexOptions.Compiled); 

In this case, we first do the work to parse into opcodes.  Then we also do more work to turn those opcodes into actual IL using Reflection.Emit. As you can imagine, this mode trades increased startup time for quicker runtime: in practice, compilation takes about an order of magnitude longer to startup, but yields 30% better runtime performance.  There are even more costs for compilation that should mentioned, however.  Emitting IL with Reflection.Emit loads a lot of code and uses a lot of memory, and that's not memory that you'll ever get back.  In addition. in v1.0 and v1.1, we couldn't ever free the IL we generated, meaning you leaked memory by using this mode.  We've fixed that problem in Whidbey.  But the bottom line is that you should only use this mode for a finite set of expressions which you know will be used repeatedly. 

Precompiled

Precompilation solves many of the problems associated with compiling on the fly.  The idea is that you do all of the work of parsing and generating IL when you compile your app, ending up with a custom class derived from Regex.  The big trade off here is that you need write a small app which will do the compilation for you (ie an app which calls Regex.CompileToAssembly(...) with the right parameters), and thus you need to know your important regexes in advance.  In general this isn't such a problem, since if you're writing a parser, you probably don't need to change your expressions at runtime. Your startup time reduces to loading and JITing your class, which should be comparable to the startup cost of interpreted mode.  Runtime performance will be identical to the compiled on the fly case.  It's the best of both worlds!