Another interesting new feature of the CLR 4 comes from the Garbage collection team. On this version, they are adding some performance enhancements on the memory allocation process. The feature is commonly called “Background GC”. But what does it actually mean?
As applications are starting to consume more memory and some of them moving to wider memory spaces under 64bits processes we have started to see some latency issues while allocating memory when the full GC is running. As you may remember, for workstation version of the CLR we use a concurrent GC. This means that the GC thread will run in parallel without blocking the application execution (well, we try to minimize the blocking time). This thread will scan Gen 2 in order to mark dead objects. This operation can take some time if the memory allocation is quite large and this prevents ephemeral collections while is running. Ok, now you may be asking what does it mean?, let me explain it with some graphics.
Let’s analyze how the current concurrency GC works and the scenario that we are improving:
Now, our application needs to perform a full GC, for this it will scan Generation 2 and try to mark the dead objects as free objects. This is executed by the GC thread, the simplified steps that it will take are the following ones:
1) It will start marking the objects, checking the stacks and the GC roots. This operation will allow further allocations, this means that your application may create a new object and this will be allocated in generation 0.
2) Now there are further allocations that the GC needs to suspend the EE (Execution engine) and this will stop all threads on your application. At this stage no allocation is allowed and your application may suffer some latency.
3) The EE is resumed in order to continue working on the heap and other bits and pieces that the GC needs to handle; at this stage the allocation is allowed. But what happen if our ephemeral segment is full while this collection happens?
4) At this stage the ephemeral collection cannot swap segments and the allocation will be delayed, adding latency to your application.
As you can see, the problem is that a single GC thread cannot cope with those two operations at the same time. The current ephemeral segment is 16mb (note that this may change in the future so don’t relay on this value!). This means that you can only allocate up to 16mb or whatever is available at the time of allocation, and as you have seen on the example this space may run out before the GC collection finishes! I hope now you understand why we don’t recommend you to call GC.Collect() without a good reason J.
Ok, now let me introduce you to the background GC. This model has been optimized to reduce the latency introduced by the scenario described above. The solution came from the idea of creating a background GC that works as it has described above and a foreground GC that will be only triggered when the ephemeral segment needs to be collected while performing a generation 2 collection.
Now, if we repeat the scenario above and we try to allocate memory on the ephemeral segment while the background GC is marking the foreground GC will execute an ephemeral collection:
The ephemeral foreground thread will mark the dead objects and will swap segments (as this is more efficient rather than copying the objects to generation 2. The ephemeral segment with the free and allocated objects becomes a generation 2 segment.
As you can see now the allocation is allowed and your application will not need to wait for the full GC to finish before allowing you the allocation.
Note that this enhancement is only available on workstation, as the server version is a blocking GC per core and we didn’t have enough time to port these enhancements into it, but is definitely in our plans in the near future. We have tested this solution in 64 cores but our future objective is to hit the 128 core mark as the SQL team has provenJ.
I hope this blog post makes a bit clearer how the GC works and what kind of enhancements we are including in .NET 4.