1. The garbage collector in X++ and the CLR

Article
12/15/2008

I often hear from X++ developers, that the .Net garbage collector (GC) does not work correctly or that it isn’t as good as the one from Dynamics Ax because it is not known to you when objects are collected. This is of course ~~nonsense~~ not true and the reason will be exposed in this posting. My intention about this posting is, that with this basic knowledge about the .Net garbage collector, you’ll be able to avoid many problems and improve the quality of your applications. Please keep in mind that this is just a much simplified description of the BC. For further information, you should consult Jeffrey Richter's ”CLR via C#”.

1.1 The GC in Dynamics Ax

The garbage collector in Dynamics Ax is very simple: it collects all unreferenced object every 3 seconds (ok, this is a little bit simplified, but it’s pretty much what the GC does). Consequently you know when the object is collected: about every 3 seconds and if this doesn’t happen you can use the Form SysHeapCheck to create a dump of the current heap, so you know how many references are currently held to that object.

Another point that differentiates X++ with C# is the existing of a destructor. In X++ this destructor is called "finalize()". As described on Msdn:

X++ objects are destructed automatically when there are no more references to them. You can destruct them explicitly in the following ways:

Use the finalize method.

Set the object handle to null.

In X++ it's the finalize method that contains all code that is used to clean up the instance (releases all objects that are held by this instance, ...). In C# this is done by the Dispose() method, but I'll describe this later. An important point is mentioned in the Msdn documentation:

Use finalize carefully. It will destruct an object even if there are references to it.

1.2 The GC of the CLR

In .Net you newer really know in advance when an object is collected, since the algorithm is much more complex and depends on much more variables than the one from Dynamics Ax. For example it depends on the currently available memory, the CPU usage or how old an object is. But not knowing when the objects are finally collected isn’t a problem at all because in most cases this stays completely transparent for the developer and, it might sound paradoxical, but this guarantees you best performance and an optimal use of resources because collecting objects is consuming resources, so whenever the GC collects it should be done economically and the system should have enough resources to handle this and if the system doesn’t have enough resources available, it should wait until enough resources will be free again.

1.2.1 Generations

In Dynamics Ax all objects without a reference are collected at once because the GC in Ax does not make any difference between the objects – they are all equal. The CLR in contrast differentiate objects.

Why does the GC differentiates objects? The idea behind this is that collecting objects in batches is more efficient than collecting all objects at the same time. For that the CLR is able to create ‘batches’ that are collected when it is opportune for the system. ‘Batches’ is not the real term, but this is it what’s done in reality.

But how to differentiate objects that should to be collected from those that can be collected later? To do this, the CLR is based on the assumption that younger objects have statistically a shorter lifetime then older objects. So the differentiation is based on its age: objects are belonging to generations. This is why the .Net GC is called a generational garbage collector. The CLR works with 3 generations: The first generation (called the generation 0) is the one that is collected frequently, because as I already mentioned) younger objects have statistically a shorter lifetime. The second one is collected by the GC under certain circumstances, and the third generation is collected when the memory is low. I know this is not precise, but I will come back to this later. As I wrote, I’m trying to keep this article very simple, but give you as many information as you need to understand the GC of the CLR.

But what exactly is a ‘generation’ in the CLR? A generation is first of all an attribute that shows how many cycles an object has survived. You can get the information of which generation your object (for example obj) belongs to by the static method GetGeneration of GC:

 System.GC.GetGeneration(obj);

1.2.1.1 Gen0

The first generation objects(called: generation 0 or gen0) , are objects that were recently created. 256KB of objects in the heap can belong to the gen0. This limit is called budget. If the total size of objects that belongs to gen0 exceeds this budget, the GC will automatically start collecting objects. 256KB is an approximate value, because this value depends of the L2 cache of your CPU. (If you like to dig very deep, read the article from Jan Gray).

In the following scheme we have a heap in three different states:

The first state shows 4 objects (A1,..,A4) that have been created recently. The flashes to these objects are indicating, that there are existing references to these objects. These four objects are all together smaller than then the budget for gen0, so there is no need for the GC to do anything.

1.2.1.2 Gen1

The second state shows the heap with five objects. In the meantime some of the initial objects have lost their reference and with the creation of the fifth object (B5) they are bigger than the budget for gen0. So the creation of the new object A5 triggers the first collect of the GC with the consequence, that A2 and A4 (that are without reference) won’t survive gen0. All the others belongs now to gen1 (shown in the third state of the heap)

This scenario will be repeated many times and consequently increase the number of objects that belongs to gen1 as you can see in the following scheme:

Gen0 has a budget of 256KB and the gen1 a budget of 2MB (this is an approximate value). But, in contrast to the limit of gen0, exceeding the limit of gen1 will not automatically trigger a collection, but gen1 will nevertheless be collected, if the budget of gen0 will be exceeded the next time.

1.2.1.3 Gen2

This is was happens in the next scheme. The first state of the heap shows a gen1 which exceeds 2MB budget, but the gen1 is only collected (as you can see in the second state) because gen0 exceeds the 256KB. All surviving objects of gen1 are belonging now to gen2.

The budget size for gen2 is 10MB and the behavior is the same as for gen1.

You can easily observe the described behavior with the perfmon. Maoni blogged about the GC performance counter some years ago and his posting might be much helpful if you’d like to simulate the described behavior of the GC. You'll find a really good article by Michael McIntyre here, too

1.2.2 Large objects

Large objects are for the CLR all objects, that are bigger then 85000 bytes. This value might change in further versions of the CLR. Maoni Stephens from the CLR team wrote a really great article on msdn about this subject with a lot of details - maybe too much details for now. To make it short: The large objects belongs from its instantiation to generation 2 and they are held in a special heap called “large object heap” (LOH) that does not compress its objects… In a first time it is important for us to know that:

- large objects are treated differently

- they belong directly after the creation to gen2

- large objects are not compressed (risk of fragmentation)

The last point isn’t directly related to the GC, but it is worth to be mentioned, that large objects are often source of fragmented managed memory and this might result under certain circumstances in a System.OutOfMemoryException when there is not enough continuous memory available.

1.3 Disposing managed objects

There is nothing easier for a programmer, than get rid of managed objects: Just wait until they get out of the scope and the GC will do all the rest for you in an extremely efficient way. You’ll never have problems with memory leaks and you can be sure that the GC will do the rest. Some kind of programmer’s heaven!

But even if this might be the programmer’s heaven, you should know what the GC does, because this will influence how you will program and it is very useful for troubleshooting your application.

The following scheme shows a heap before the collection and after the compact phase. The first state contains roots (static fields, method parameters, local variables and CPU registers) and unreachable objects:

In a first step, the collection will mark all roots. All unreachable objects (A1, A6 and A7) are considered to be garbage and they aren’t marked. In a second step, the GC is doing the compact phase, and walk through all objects in order to identify continuously non marked memory and shift them down in memory. That avoids fragmentation, and allows the CLR to know exactly where the next object is to be allocated: on top of the last object (here it would be A6).

The important point here to know is that managed resources are in good hands and that you don’t care about major problems like memory leaks and so on. Your application will be reliable and secured with managed code.

1.4 Disposing unmanaged objects

As we all know, there’s no programmer’s heaven and more than once we all were convinced of the existing of a programmer’s hell… I can assure you that there’s no such hell ;-)

Sometimes you will be challenged because you need to communicate with resources outside of the managed world (CLR), like accessing the file system, the Win32 (P/Invoke), COM unmanaged or unsafe code and so on. In all these cases there are resources initialized that are beyond the control of the GC and so it’s up to you to release these resources in order to prevent memory leaks or other messes. .Net gives you some tools to manage this, but at the end it’s up to you to implement them and write robust and secured code.

There are much more ways to assure the release of external resources, but that would go far beyond of what I intended with this article.

1.4.1 Unmanaged and unsafe code

It might be helpful for you as X++ developer to know the difference between unmanaged and unsafe code since you will work with both when you’re using the BC.Net. On programmers-heaven you’ll find the following definition:

Un-managed code runs outside the Common Language Runtime (CLR) control while the unsafe code runs inside the CLR’s control. Both un-safe and un-managed codes may use pointers and direct memory addresses.

The Dynamics Ax resources are unmanaged and parts of the BC.Net are unsafe.

1.4.2 The IDisposable interface

A paradox for a non experimented managed-code developer is the fact that you know perfectly when the constructor is called (you do this explicitly) but you don't know when the destructor is called. This is at least that what the destructor seems to be, since this is called the Finalize method (~ClassName). One of the reason why you don't call it destructor is what I mentioned before: It's the GC that determines the execution of this method and not you. Since it's up to you to manage unmanaged resources, it wouldn't be a great idea to implement the deallocation of those resources in a method of which you never know when it is called and you can't call it explicitly since it isn't a public method.

Instead of implementing the deallocation of unmanaged and unsafe resources in the Finalize method you implement that code in a public method called Dispose() and implement the interface IDisposable, so you can call the Dispose() method whenever you need to release the resources:

    1:  ExternalResources er = new ExternalResources();

2:

    3:  try

    4:  {

    5:     //do something with it

    6:  }

    7:  finally

    8:  {

    9:     er.Dispose();

   10:  }

By calling the Dispose() in line 13 you can be sure that in any case the resources from the instance "er" will be released. “Finally” will always be executed at the end of the try-scope: With or without exception. C# offers you the using-statement which will simplify the code from above:

    1:  using(ExternalResources er = new ExternalResources())

    2:  {

    3:  //do something

    4:  }

“Using” can be used if the class implements the Disposable interface. I would even say that you must use it (or at least the first method) if a class implements the IDisposable interface!

1.4.3 The Disposable pattern

But what if for any reason the Dispose hasn't been executed? Without any additional code the resources will not be released!

An object should always be able to release all external resources that the object itself has allocated at the end of its lifetime. For that reason Microsoft defined the IDisposable-pattern

    1:  public class ExternalResources: IDisposable

    2:  {

    3:      ~ExternalResources()

    4:      {

    5:          Dispose(false);

    6:      }

7:

    8:      public void Dispose()

    9:      {

   10:          Dispose(true);

   11:          GC.SuppressFinalize(this);

   12:      }

13:

   14:      protected virtual void Dispose(bool disposing)

   15:      {

   16:          if (disposing)

   17:          {

   18:              // Clean up all managed resources

   19:          }

20:

   21:          // Clean up all external (unmanaged and unsafe) resources

   22:      }

   23:  }

Calling Dispose() will release the external resources, but without calling explicitly the Dispose, the resources will be released when the GC will call the Finalize method (~ExternalResources), too. Since calling the Finalizing method by the GC consumes resources, we want to avoid that last call when the Dispose() method has already been called. That's why we call the SuppressFinalize at the end of the Dispose() method. So with this instruction the GC is told that it don't need to call the Finalize method when he collects this objects since finalizable objects are treated in a different heap and need an additional cycle to be cleaned. You remember that in X++ the method "finalize" was the destructor of the class? Now you know that there is a Finalize in C#, too, but that has nothing in common with a destructor and the behavior is completely different.

Shawn Farkas wrote a great article about all this!

1.5 Finalizing objects

I started this posting with the assertion:

In .Net you newer really know in advance when an object is collected

Well, this is true since you don’t know the certain time when the GC ‘decides’ to collect the objects, but you know the 5 conditions under which the GC will start collecting:

- Generation 0 is full: The GC collects automatically when the budget of gen0 has been exceeded (see 1.2.1)

- System has low memory: When the system signals "low memory", the GC will collect.

- AppDomain is unloading: When unloading an AppDomain the GC collects all objects first.

- CLR shuts down: The CLR tries to collect all objects in a friendly manner but after reaching a timeout (It is currently 40 seconds in the CLR2) it nevertheless shuts down. Unhandled resources might in this case be left open. In order to be able to debug this particular situation, read the blog I linked)

- GC.Collect: It is absolutely not recommended to ask the GC explicitly to collect, since the GC is self-tuning and already optimized. Calling GC.Collect explicitly results most of the time in less performing application and systems. So please do this with consideration and only if you precisely know what you are doing. Anyway, you can trigger the collection by calling:

 GC.Collect();

1.6 The two GCs and concurrent GC

1.6.1 The workstation GC

This GC is used by console and WinForm (incl. WPF) applications and is optimized for low latency. Without getting more into details, low latency is the result when the application GC don’t pause in most cases the application. That’s what concurrent GC means: The GC can do it’s work while the application is still responsive for the user. You can get more information from Mark here and from Chris Lvon here.

1.6.2 The server GC

ASP.Net applications are using the server GC, since it is optimized for multicore/multiprocessor and throughput. In contrast to the application GC, the server GC pauses the applications while collecting. In common Windows Form application scenarios, this might not an option.

But, the server garbage collection should be the fastest option for more than two processors. This might be important for you to know, since more and more workstations have more than 2 cores and in that case it might be interesting to activate the server GC, or at least test if that might improve the performance of your application, even if the server GC doesn’t support concurrent GC (until now). You find more information about this on Msdn.

1.7 The conclusion

The GC of the CLR is much more complex than the one from Dynamics Ax, but much more efficient and appropriate for server applications. Programming managed code is very easy and .Net offers you a number of tools to manage external resources.