Explanatory notes on Rotor's Garbage Collector

I'm posting a document written by Patrick Dussud (who's an architect on the CLR team) about Rotor's GC. The first release of Rotor had a simplified GC that was polled by the FJIT (usually done at the point of a call), which is massively different to the way that the product release of the CLR does its thing. One specific architectural difference is that the product release has the ability to run in the background (and on multiple processors if you're running the server GC), while the Rotor GC does not do that. Anyways, on with the show:

Oversimplified Model

For explanatory purposes, first consider the following oversimplified model of the heap and garbage collection. This is not what is actually implemented.

All garbage-collectable objects are allocated from a heap which is stored in a single contiguous range of address space. The oldest objects are at the lowest addresses, while new objects are created at increasing addresses. The allocation pointer for new objects marks the boundary between the used and unused areas of the heap.

Objects are partitioned into generations. Objects within a generation are all roughly the same age. A generation is a contiguous area of the heap, so an object's generation can be determined simply by comparing its address to the addresses of the generation boundaries. A full GC condemns the entire heap and performs a non compacting Mark and Sweep collection. An ephemeral GC compacts a subset of the generations lying above a certain address, and does not disturb the older objects at lower addresses.

The Real Heap Model

All garbage-collectable objects smaller than certain sizes (called small objects) are allocated from a heap which is stored in one or more heap segments. A heap segment is a single contiguous range of address space, typically 16 megabytes in size, Each heap segment has a reserved size and a committed size, which may be smaller than the reserved size. The committed portion of a heap segment consists of a header followed by alternating used and unused areas. The uncommitted portion consists of virtual memory pages at the high-address end of a heap segment which are reserved but not committed.

Heap segments have an order, from oldest to youngest, which may not be the same as the order of their addresses. The oldest objects are in the oldest heap segment, while new objects are created in the youngest heap segment. Within a heap segment, the oldest objects are usually at the lowest addresses, but the order can sometimes be scrambled because of the use of mark-sweep garbage collection and because of pinning.

The heap can expand by adding a heap segment, which becomes the youngest. The heap can contract by removing one or more of the younger heap segments.

Large objects are allocated using a malloc-type memory allocator and are freed by the garbage collector when they become unreachable. There are two doubly-linked lists of large objects, one for objects containing pointers and the other for objects that contain no pointers. The list of pointer-bearing large objects is kept sorted by address. The threshold size for large objects is 20K bytes. Periodically the heap is garbage-collected by identifying dead objects and turning consecutive runs of dead objects into unused areas which can then be reused for object allocation. Sometimes the managed to unmanaged interface temporarily prevents some objects to move so unmanaged code can access it normally. If the object is pinned while GC happens, we refer to the object as a pinned object. Pinned objects can prevent compaction from consolidating all unused areas in a heap segment into a single unused area. Since pinning is relatively infrequent, this should not significantly hurt performance.

Generations

Objects are partitioned into generations. Objects within a generation are all roughly the same age. The eldest generation consists of older objects, pointers to which are not specially tracked. The ephemeral generations are all generations other than the eldest generation. Pointers to the small fraction of objects that reside in the ephemeral generations are specially tracked. The younger a generation the more frequently it is garbage collected. When an ephemeral object survives garbage collection, it is promoted into the next older generation. There are only two generations.

A full GC condemns all objects. An ephemeral GC only condemns one or more ephemeral generations; older generations are not condemned and thus not subject to this garbage collection. Note that if a generation is condemned all younger generations are also condemned. The ephemeral generations are contained entirely in the youngest heap segment. Small objects are created in the youngest ephemeral generation, but large objects always belong to the eldest generation. An object's generation can be determined simply by comparing its address to the addresses of the ephemeral generation boundaries. If the object is outside any ephemeral generation, it must be in the eldest generation. Keeping large objects in the eldest generation preserves this easy generation computation.

Garbage Collection Algorithms

Allocator:

The allocator data structure consists of the allocation_context which contains a pointer to the next free space to allocate from and a pointer to the end of the free space. The allocator increments the pointer and compares it to the limit. If the pointer reaches or exceeds the limits, the pointer retracts to its former position and the allocators calls into GC to obtain more space. If there is room, GC will clear some memory – typically 8KB -, and gives to the allocation context. When there isn’t room for more allocation, a collection is triggered.

The allocation can be contented so it is protected by a global lock. On SMP machines, this lock would quickly becomes hot, so each thread keeps a local allocation context in its Thread Local Storage.

Ephemeral Garbage collection consists of one phase

Stop and Copy: The purpose of the mark phase is to copy live objects within the condemned generations from their allocation space to the destination space, which is contiguous with the rest of the higher generation. The reachable object graph is explored in a breadth first fashion: The roots are copied first and then the copied objects are scanned for references and the references are copied, until all copied objects have been scanned.

Full Garbage collection consists of two principal phases:

Mark. The purpose of the mark phase is to distinguish live from dead objects within the condemned generations. At the conclusion of this phase all live, condemned, small objects have their mark bit set, and have their pin bit set as well if they are pinned. Finalization has been dealt with and weak pointers to dead objects have been nulled. Dead large objects have been unthreaded from the linked list and their memory has been freed.

Sweep. The purpose of the sweep phase is to turn dead objects into free lists ready to be used for allocation of lower generation live objects during subsequent collections.

Object Layout

The object layout can be diagrammed as follows:

            Object MethodTable
+---------------------------+ +---------------------------+
| vtable address | P M | --> | -> EEClass |
+---------------------------+ +---------------------------+
| length if array | | size |
+---------------------------+ +-------------+------+------+
| slots / components | | flags | type |cmpsiz|
| | +-------------+------+------+
| ... | | ... |
| | | |
| | | method pointers |
+---------------------------+ +---------------------------+

The first longword in an object is the header, an instance of class CJavaObject. The two low-order bits of the MethodTable address are overlaid with the Pinned (P) and Marked (M) flags. These flags are always zero except during garbage collection, so they do not need to be masked when accessing the MethodTable, except by the GC.

If the object is not an array, the remaining longwords are slots. If the object has no slots, a dummy slot is inserted to ensure that all objects are at least 8 bytes. The garbage collector requires a minimum object size of 8 bytes for the Relocation Tree.

The second and third longwords of a MethodTable are new. They contain the following fields:

flags – 16 bits - contains-pointers, large-object, some other stuff not relevant here

type - 8 bits -

component_size - 8 bits - bytes per array component, 0 if not an array

size - 32 bits - bytes per instance, aside from array components

GC Descs

Object instances contain references to other objects. Inside of the object, the references are grouped contiguously as much as possible; this contiguous group of references is a reference site.

GCDesc is a structure that describe the references contained in objects. It is allocated as a part of the method table (before the EE class pointer). It can be diagrammed as follows:

+---------------------------+
| n entries |
+---------------------------+
| ref offset |
+---------------------------+
| number of ref – obj size |
+---------------------------+
| |
| ... |
| |
| |
+---------------------------+

n entries is the number of reference sites described in the gcdesc. Each reference site descriptor contains:

ref offset – offset from the beginning of the object of the first reference within the reference site.

number of ref – number of contiguous references contained in the reference site.

Handles

Besides static variables, stack references and references contained inside of objects, we support external references to objects called handles. They are used by unmanaged code to hold on to an object. Handles are non moveable and are explicitly allocated and released from unmanaged code. They are provide the following reference semantics:

Strong – like normal object reference, keep the object alive unconditionally.

Pinned – the referenced object becomes pinned for the life of the handle.

Weak short – are non null as long as the object is alive because of another strong reference. They are null otherwise

Weak long- are non null as long as the object hasn’t been reclaimed by the garbage collector.

The distinction between weak short and weak long comes from the finalization semantics, a weak short reference to freachable object is null but a weak long reference still refers to the freachable object

Object Finalization

When objects hold on to a non managed resource that needs to be explicitly disposed of, it is good practice to make sure that the resource is disposed of before the object gets collected. For this reason, languages like Java and VB support finalization. Finalizable objects are placed on a special weak reference list when created. GC monitors this list and when all strong references to a finalizable object are gone, GC will take the reference out of the weak list and place it on a finalization list which keeps the object alive; the object is said to be freachable . After GC completes, a finalization thread will go though the list of freachable objects and call the finalization method on each object. If an object does not become reachable again, it will be collected when its generation gets condemned. If it does become reachable again it won’t go through a freachable state when all of the strong references to it disappear. In the future, we will implement a finalization registration API to place the object back on the weak list.

Interior References

For convenience, we allow the stack to reference an object by the address of one of its members instead of its beginning. It makes ByRef implementation convenient. This complicates the mark phase because GC can’t set the bit of the DWORD pointed to by the reference anymore. An extra argument passed by the code manager to GC tells GC that the reference may be interior and GC will search through its data structure to locate the beginning of the object. Since this is a costly process, only stack locations are allowed to contain interior references.

This posting is provided “AS IS” with no warranties, and confers no rights.