Problems with System.OutOfMemoryException At System.String.GetStringForStringBuilder in 32-Bit Managed Solutions

Symptom

Managed code solutions that use classes from the System.Data or System.Xml name may encounter a System.OutOfMemoryException when working with large datasets, XML files, or repeated operations that involve hundreds of (xml) object serialization calls from standardized components.  The exception may cause the program to stop working or report an unexpected error to the user. 

When the exception occurs, the top of the call stack will typically appear as something similiar to the following:

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.String.GetStringForStringBuilder(String value, Int32 startIndex, Int32 length, Int32 capacity)
at System.Text.StringBuilder.GetNewString(String currentString, Int32 requiredLength)
at System.Text.StringBuilder.Append(String value)
at System.IO.StringWriter.Write(String value)
at System.Xml.XmlTextEncoder.Write(String text)

The excact call stack may differ, and the problem can appear in many different products and different types of solutions, both on the client or on the server.

More Information

The problem is typically the result of a compound memory problem which results from limitations in the implementation of the StringBuilder class for .NET.  Even if you do not call the StringBuilder class yourself, many managed objects use it indirectly for serialization, and as show above, the System.Xml classes use it for constructing and reading XML data (a common task in .NET).

The problem occurs if the process runs low on contiguous virtual memory space. This condition can occur through the normal running of managed code under stress conditions or if you employ very large DataSets, XML files, or other large string blocks from the managed code space.  As you use StringBuilder objects, the memory buffer used for large strings is allocated directly from Virtual Memory, and is managed by the CLR on the Large Object Heap (LOH).  The CLR does not (by design) compact the LOH when doing garbage collection because of the negative performance impact it would incur. In addition, the LOH is not typically collected unless a full (generation 3) collection occurs, which is much less frequent than normal GC collections. As a result, extended use of the LOH can result in both the depletion and fragmentation of the process virtual memory space over time. Since StringBuilder uses the LOH for large strings, this problem extends to any class that might use StringBuilder for serialization, XML parsing, or data remoting.

In addition to the fragmentation issue, the StringBuilder class implements a simple algorithm for allocating more memory to expand a string buffer.  This system is designed for performance when dealling with small strings, but can result in poor memory use for very large strings (like you might see with a serialized DataSet), and tie up more memory than needed or expected. In addition, each time the buffer is expanded, another block of virtual memory is required and the original block may remain on the LOH because of low frequency of generation 3 collections.  These facts can compound the problem, depending on the exact situation involved.

Naturally, the problem is unlikely to occur with 64-bit applications since the virtual memory address range for a 64-bit Windows application is much much greater than 32-bit processes.   For comparison, a Windows 32-bit process has an address range up to 4GB, with typically 2GB accessible for allocations by the application. A Windows 64-bit process, on the other hand, has an address range up to 16TB, with 8TB (7TB on Itanium-based systems) accessible for allocations. 

Unfortunately, there is no single answer to problems involving this error.  While some solutions exist for certain cases (for example, DataSets can be remoted using binary representation instead of XML, or code that uses StringBuilder directly can intialize the buffer with a size large enough to hold the data being manipulated to cut down on duplication from "growing" strings), the problem is a limitation of the design of elements from both the StringBuilder class and the LOH itself.  Microsoft is investigating these problems to find solutions that can both manage the overall problem, while still keeping the fundamental performance of the original design.