Regex Class Caching Changes between .NET Framework 1.1 and .NET Framework 2.0 [Josh Free]

The .NET Framework System.Text.RegularExpressions.Regex class maintains a cache of parsed regular expressions.  The cache improves the performance of methods that create regular expressions, as the Regex class is able to avoid the cost of re-parsing and re-compiling existing regular expressions.  The cache does not affect the performance of match operations on the same input string, as match results are not cached.

 As part of the .NET Framework 2.0 Redistributable, the BCL team made changes to the Regex class to improve its caching correctness.  If you use the Regex class in your code then please read on – as these changes may impact the performance of your application on the 2.0 runtime:

.NET Framework 1.1 Regex cache behavior

Under 1.1, the Regex class has an unbounded cache size.  Every regular expression exists in the Regex cache.  Each new regular expression either creates a new entry in the cache or uses an existing cache entry.  Any time an existing cached entry is reused, Regex does not need to interpret or compile the regular expression string – which improves performance.

The 1.1 Regex cache entries maintain a reference count (e.g., COM style AddRef and Release) – to keep track of how many objects are using them.  The reference counts of cache entries are decremented when Regex objects  are finalized.  When the reference count on any cached entry reaches zero (0), the cache entry is deleted.

Overall, this design allows fast creation of regular expressions when the same expression already exists.   However, the cache behavior in 1.1 is flawed. The use of finalizers as part of the static cache design goes against the Reliability Best Practices in the .NET Framework Developer’s Guide.  To quote one part of the best practices guide, “Finalizers must be free of synchronization problems. Do not use a static mutable state in a finalizer. ”  Additionally, the use of heavyweight finalizers hurts the performance of the garbage collector.

.NET Framework 2.0 Regex cache behavior

There are two important cache behavior changes in .NET Framework 2.0 from .NET Framework 1.1:

  1. The 2.0 Regex class no longer has an unbounded cache size.  The cache has a fixed-size, with a default value of fifteen (15).  Programs can override the default cache size by setting the Regex.CacheSize property.

  2. The 2.0 Regex class no longer caches parsed regular expressions created by Regex instance methods, it only caches regular expressions created by Regex static methods.

    Take for example the following two calls that use regular expressions.  The first example creates a regular expression instance which is not cached, where as the second example uses a static method that does cache the parsed regular expressions.

    Creates a Regex instance ‘r’ containing the regular expression “a*” and checks for a match on the ‘inputString’
    // “a*” is not added to the cacheRegex r = new Regex(“a*”);r.Match(inputString);

    Calls the static method Match to check ‘inputString’ for a match on the regular expression “a*”
    // “a*” is added to the cacheRegex.Match(“a*”, inputString);

    Regular expressions created by instance methods are not cached in 2.0 as it makes much more sense for the application developer to manage the lifetime of their Regex object on their own.

    Regular expressions created by static methods are cached in 2.0 as users of the static methods do not have any way of managing the lifetime of their regular expressions.  Developers that want the full control of managing the lifetime of their regular expressions should use Regex instances instead of Regex static methods.

What happens when the 2.0 cache is full

The 2.0 Regex uses the Least-Recently Used (LRU) cache replacement rule.  This means that when the cache is full, the cache items that are the least recently used are the ones discarded to make room for new items.

What the 2.0 cache changes mean for your application

  1. Review the use of existing Regex instances in your application.  Since Regular expressions created with instance methods are not cached, make sure that you are not unnecessarily creating the same Regex instances over and over again by creating an instance in a tight loop:

    Bad Code - creates ‘r’ one hundred (100) times
    for (int i = 0; i < 100; i++) {    Regex r = new Regex(“a*”);     if (r.IsMatch(myArray[i])) {        …        …    }}

    Correct Code – creates ‘r’ one (1) time
    Regex r = new Regex(“a*”); for (int i = 0; i < 100; i++) {    if (r.IsMatch(myArray[i])) {        …        …    }}

  2. Consider managing the lifetime of regular expressions in your application yourself instead of relying on the underlying library.  Do this by replacing Regex static method calls with Regex instance method calls.

  3. If you prefer to only use Regex static methods in your application, consider setting the Regex.CacheSize property to a value that makes better sense than the default of fifteen (15) for your application.