Taming the CLR: How to Write Real-Time Managed Code


I've actually been meaning to write about real time applications for ages so when I was asked to give a talk at MS Gamefest (http://microsoftgamefest.com) I jumped at the opportunity to give myself a hard reason to do the homework.  Last Tuesday I gave that talk  and below are the slide contents plus my speaker notes.  The actual audio was recorded an will probably be available soonish, when that happens I'll post a link.


I hope you find this somewhat interesting 🙂


Managed Code for Real Time? 



  • It may seem like a crazy notion to some people but gamers probably aren’t nearly as surprised
  • Managed Runtimes in the game world are common

    • My first AI code was written in LPC for LPMud

  • You can get great results out of a managed runtime
  • Why?

Good for What Ails You



  • Garbage Collected memory model is the centerpiece

    • Amortized cost of allocations in this model can be excellent!

  • Long term fragmentation is less of a problem

    • Who hasn’t played a game that degrades over 2 hours of play?

  • Locality can be good due to compaction – we need all the cache hits we can get
  • Easy to use model, immune to classic “leaks” and wild pointers

Garbage collected memory models tend not to have long term degradations.  Many of the most deadly problems simply can’t happen in this kind of model.  When people talk about a “GC” leak what they mean is that they are holding on to a pointer that they should have nulled out.  This is much easier to track down than memory that was “lost” in the classic leak model – nothing points to it or it isn’t freed.  Wild pointers are totally impossible.


All of this is great news for game developers.


Great, I’ll take a dozen!



  • Wait, not so fast, you have to read the fine print…
  • There are important practices to get these benefits
  • The GC is like a BFG9000

    • Don’t shoot yourself in the foot with it!

People learned best practices for classic memory allocators.  In a word – they have awful performance so you have to wrap them.  And that’s exactly what everyone does.  You get assorted different custom allocators for different purposes.  Ones that allocate and free a big chuck, or carve out lots of little objects, or some other important specialized requirement.  The main thing is you are careful to get to know your allocator(s) and use them accordingly.
Likewise, you have to get to know the GC.  It’s useable directly – without wrapping – in a variety of cases, a great step forward.  But you might shoot yourself in the foot if you do unwise things.


Things you need to know



  • In .NET CF the GC is a compacting mark and sweep collector

    • Contrast with the regular .NET Collector which is generational

  • What does this mean?

    • You probably should understand both models because you can find yourself on the PC platform too

(You can find the picture that I used in this article here, this discussion is taken directly from that article)


Collectors come in many flavors and today we’ll be talking about the flavor of a couple of different collectors that you might run into.  The one in the .NET CF is quite a bit different than the one in the desktop.  There are different rules for both – although if you follow the .NET CF rules you should get excellent performance out of the Desktop collector as well.


“Compacting” refers to the fact that our collectors will squeeze out the free memory, kind of like a disk defrag, when they think it is wise to do so.


“Generational” refers to the fact that the desktop collector can collect just some of the objects rather than all of them – notably it can collect “just the new stuff”


Simplified GC Model



  • This shows generations which are not always present
  • Some of the details in the following discussion are not exactly correct

    • The idea is to have a mental model that helps you understand costs, you need this more than you need the exact details

In the simplified model (only):



  • All garbage-collectable objects are allocated from one contiguous range of address space
  • The heap can be divided into generations (more on this later)
  • Objects within a generation are all roughly the same age.
  • The oldest objects are at the lowest addresses, while new objects are created at increasing addresses. (In the drawing, addresses are increasing going down)
  • The allocation pointer for new objects marks the boundary between the used (allocated) and unused (free) areas of memory.
  • Periodically the heap is compacted by removing dead objects and sliding the live objects up toward the low-address end of the heap.
  • The order of objects in memory remains the order in which they were created, for good locality. 
  • There are never any gaps between objects in the heap.
  • Write Barriers are used to note changes in pointers from old objects to new objects allowing the newer objects to be collected seperately.
  • See http://blogs.msdn.com/ricom, http://blogs.msdn.com/maoni for more articles and a longer discussion of this topic here http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/dotnetgcbasics.asp

Why do you care?



  • Really there are just a few things that you need to keep top of mind
  • Allocations are very cheap, basically you increment a pointer
  • Objects that are allocated close together in time tend to be close together in space
  • Full collections can be very expensive because the heap could be arbitrarily big

You need to know these things so that you get good locality of reference in your data structures.  Better locality means fewer cache misses which means fewer clocks per instruction and therefore more frames per second.


You need to know how to prevent very expensive collection costs.  Why do these costs arise and what can you do to limit the cost.


Some good things to keep in mind



  • The garbage collector concerns itself with the living – dead objects cost nearly nothing.  Even if we don’t compact and choose to thread a free list through the dead objects we still don’t have to visit every dead object to do that.  Costs of collection tend to be driven by live objects.
  • The overall amortized cost of memory allocation in a garbage collected world tends to be linear in the number of bytes allocated.  i.e. allocation volume drives the total cost, which includes zeroing memory, actually doing the allocations (the cheapest part) and the eventual collections.  It’s not always the same linear factor because that depends on how much is living and how rich in pointers everything is but, generally, allocation cost is a “per byte” thing for any given application.

Stay in Control



  • You can get great performance in this kind of world if you stay in control
  • In the regular CLR, for real-time or high throughput applications, I advise people to keep their eye on mid-life objects, make sure they have few of them

    • Mid life objects cause you to have to do full collections, and a lot more memory is implicated

  • But what if we don’t even have generations?

The classic thing you have to worry about in the desktop CLR is if you start leaking a lot of objects into generation 2 – the oldest generation.  So basically object-lifetime patterns drive performance – the allocation rate and the death rate in each generation.


This is true because in the desktop world the GC heap could be very large (e.g. 1GB+) and full collects could therefore be very costly.  So the trick in this world is to make sure you’re only doing partial collections.


There are many real-time applications in the desktop/server world that are successful, e.g. assorted financial companies have stock streaming services that do things with quotes and then facilitate order processing.  These are all done on a deadline.  How do they do it?  Strict control of allocation volume and promotions to the elder generations.


.NET CF Considerations



  • A good way to think about .NET CF is that it’s like you only have generation 0
  • Your total heap size should be roughly what your generation 0 size would have been – that means comparable to the CPU cache
  • Collection times in that world will be excellent
  • If your heap gets too big collect times will kill you

If you want to get the best performance your total heap sizes in .NET CF should be roughly what your generation 0 size would have been in a generational collector.


If you keep accumulating old objects thereby letting your heap grow, the fixed cost of marking those objects during collections will start to hurt you. 


If you have large numbers of objects that you need to pre-create that are going to survive you can minimize the cost of managing these object by keeping them devoid of pointers.  If no tracing needs to happen huge swaths of memory can be marked as in-use (e.g. arrays) without even having to look at them.


So a good tactic is to keep your very long lived data low in pointers (e.g. use handles) and things you churn rich in pointers so that they are easy to manage.  Controlling this blend is up to you.


Remember that when you start using handles you’re back to managing your own object lifetimes and free lists and so forth but that doesn’t matter at all if you’re talking about objects that basically stay around for a whole level.  Use this wisely.


The total heap should be about the size of the CPU cache or a small multiple of it so that you are going to get lots of hits when collecting and processing generally.


Also, .NET CF needs no write barriers as its not generational, so one less cost right there.


Allocation Rate



  • In addition to typical heap size you also need to keep your eye on the allocation rate
  • Freshly allocated memory has to be zeroed, and then “constructed” (by you)

    • Basically that’s a lot of memory traffic

  • At 20-40 MB/s you are going to be pleased with the memory management overhead

    • At 60 frames per second that’s around 500k per frame, or say 20k objects of modest size

Volume can kill you as well.  Zeroing out all the memory can be expensive – and of course volume drives collections.


Collections are typically triggered when a certain number of bytes has been allocated – at that point the GC deems it wise to do a collection because there is likely to be enough junk that it’s worth bothering with.


The GC gets its efficiency by being lazy – collect to aggressively and all those savings go away.  On the flip side, collect not enough and memory usage skyrockets and locality is destroyed.  The GC keeps these two competing things in balance with allocation budgets.
At the suggested volume of allocations and heap size you should be able to achieve garbage collection overheads in the low to mid single digits of percent.  That’s a great result for general memory management.


What else?



  • Jitted code optimization is not as good as a full optimizing compiler

    • So this is no place to do intense floating point math, that’s what Shaders are for

  • BUT! You’ll write simpler code

    • The fastest code is the code that isn’t there!
    • Simplifications can easily dwarf code-gen taxes, and give you coding time elsewhere
    • Cheap memory model can be ideal for AI logic and physics

Jitting isn’t going to be your death… it’s like a fixed overhead.


The idea is that you can more than win this back with simplified logic.  Lack of destructors.  No cleanup code to run (and who wants to write code to (e.g.) visit partly created trees and release them?)


Less code to write means more time to focus on algorithmic gains in your code – that’s where the real money is.  But, full disclosure, you should know you are starting at a modest penalty.


Use the coding simplifications to your advantage, build great algorithms that would have been too hard or too expensive in man-hours to code otherwise and win.


Don’t try to write a Phong Shader in managed code for your game.  That’s what the GPU is for.



Interop with Native


Just a few words here, again you can shoot yourself in the foot



  • Keep your interop under control 
  • Use primitive types in the API whenever possible

    • Marshalling costs are what will kill you

  • Simple calls can be very fast 

    • Desktop CLR can get many millions of calls per second on simple signatures

  • Native Interop is not possible on the Xbox 360

Not much to say here other than, if you must, on the desktop, then keep it as simple as possible.


On 360 it isn’t even an option, therefore you can’t get it wrong 🙂

Comments (37)

  1. Awesome article! thanks for posting it 🙂

    At some point, I’d love to hear  some further thoughts on when to use pointers (reference types?), and how you might improve something by using Handles.

  2. Chris Nahr says:

    That was a fascinating article. One question: what about the effects of automatic GCs on frame rate and responsiveness on the 360? Would you recommend calling the GC manually in this environment, to ensure that the display won’t stutter and user input won’t be delayed? Or does the 360 version of the .NET Framework automatically avoid these issues somehow?

  3. Chris Nahr says:

    Whoops, another question. The Compact Framework with the GC characteristics you describe is the one being used on the Xbox 360, right? I’m assuming this is the case but your article doesn’t actually mention it.

  4. Ben Hollis says:

    Could you toss out a link or two on handles vs pointers? I’m not sure exactly what you mean in the .NET world. I kinda always thought everything was "references" which were basically pointers.

  5. Matt says:

    Ben:

    Handles is likely being used in here in the generic (small G) sense of the word.

    I.E. a number, which when passed to the relevant manager gives you back the thing you are interested in.

    E.g. imagine you know there will only ever be X entities in your world and most of the time they will all exist.

    You might create a structure like a tree for dpeth ordering.

    You could either have a pointer based tree or an array based one.

    Obviously the array based tree is better from a GC point of view (less pointers, more locality potential when traversing).

    However if you use the array structure but, within your tree nodes you keep a reference to each entity object then you have additional GC cost. The GC must traverse each node to follow all these references.

    If intsead your tree nodes contain some integer based lookup to your entities then the nodes contain only integral values, thus the entire tree can be throught of by the GC as nothing but a block of used memory.

    If you have lot’s of data structures holding information about the arrangement of entities this gain can apply to all of them.

    Obviously this has a cost in access time (you must do the lookup by hand which will involve a few additional operations *).

    But the GC cost is much lower for this block of memory.

    Balancing this cost is where the skill of the coder comes in.

    * The handle should be made a well defined struct so as to hide the index mechanism in case you need to change it, but in effect it is likely to be an index into your array of all entities)

  6. JoelMartinez says:

    Thanks for the clarification Matt … I wonder if I might be able to get some feedback as to wether this is along the lines of what you’re talking about:

    http://pastebin.ca/146362

  7. Frank Hileman says:

    Joel, I think Rico was referring to integers instead of pointers. If your handle is an object containing an integer (as in the example), that defeats the purpose. Here are a couple examples using integers instead of pointers.

    /// <summary>

    /// Creates a binary tree from a linked list of pooled free nodes, using array

    /// indices as node references. Nodes can be removed, but the method was not written.

    /// </summary>

    class Tree5

    {

    private const int startSize = 16;

    private static Node[] pool = CreateNodes();

    private static int nextFree = 0;

    private static int root = -1;

    private static int count;

    private static Node[] CreateNodes()

    {

    Node[] nodes = new Node[startSize];

    InitializeNodes(nodes, 0);

    return nodes;

    }

    private static void InitializeNodes(Node[] nodes, int start)

    {

    // initialize all but last

    InitializeNodes(nodes, start, nodes.Length – 1);

    nodes[nodes.Length – 1] = new Node(-1, -1, -1); // marks end of pool

    }

    private static void InitializeNodes(Node[] nodes, int start, int count)

    {

    // right node is next free; -1 marks empty reference

    for (int i = start; i < count; ++i)

    nodes[i] = new Node(-1, -1, i + 1);

    }

    static public void Insert(int key)

    {

    if (root == -1)

    {

    root = NewNode(key);

    return;

    }

    int t = root;

    for (;;)

    {

    Node n = pool[t];

    if (n.Key == key)

    return;

    if (key < n.Key)

    {

    if (n.Left == -1)

    {

    pool[t].Left = NewNode(key);

    return;

    }

    t = n.Left;

    }

    else

    {

    if (n.Right == -1)

    {

    pool[t].Right = NewNode(key);

    return;

    }

    t = n.Right;

    }

    }

    }

    private static int NewNode(int key)

    {

    if (nextFree == -1)

    IncreaseCapacity();

    int t = nextFree;

    nextFree = pool[t].Right;

    pool[t] = new Node(key, -1, -1);

    count++;

    return t;

    }

    private static void IncreaseCapacity()

    {

    int size = pool.Length;

    Node[] nodes = new Node[size * 2];

    Array.Copy(pool, nodes, size);

    InitializeNodes(nodes, size);

    pool = nodes;

    nextFree = size;

    }

    public static void Reset()

    {

    InitializeNodes(pool, 0, count);

    count = 0;

    nextFree = 0;

    root = -1;

    }

    private struct Node

    {

    public int Key;

    public int Left;

    public int Right;

    public Node(int key, int left, int right)

    {

    Key = key;

    Left = left;

    Right = right;

    }

    }

    }

    /// <summary>

    /// Creates a binary tree from an array of pooled free nodes, using array

    /// indices as node references. Nodes cannot be removed; the tree can only be reset

    /// as a unit. Capacity can be initially set.

    /// </summary>

    class Tree6

    {

    private const int startSize = 16;

    private static Node[] pool = new Node[startSize];

    private static int count = 0;

    // NOTE: in this tree, 0 signals a null node reference

    public static void SetCapacity(int capacity)

    {

    pool = new Node[capacity];

    count = 0;

    }

    static public void Insert(int key)

    {

    if (count == 0)

    {

    NewNode(key);

    return;

    }

    int t = 0; // root node always 0

    for (;;)

    {

    Node n = pool[t];

    if (n.Key == key)

    return;

    if (key < n.Key)

    {

    if (n.Left == 0)

    {

    pool[t].Left = NewNode(key);

    return;

    }

    t = n.Left;

    }

    else

    {

    if (n.Right == 0)

    {

    pool[t].Right = NewNode(key);

    return;

    }

    t = n.Right;

    }

    }

    }

    private static int NewNode(int key)

    {

    if (count == pool.Length)

    IncreaseCapacity();

    int t = count++;

    pool[t] = new Node(key);

    return t;

    }

    private static void IncreaseCapacity()

    {

    int size = pool.Length;

    Node[] nodes = new Node[size * 2];

    Array.Copy(pool, nodes, size);

    pool = nodes;

    }

    public static void Reset()

    {

    count = 0;

    }

    private struct Node

    {

    public int Key;

    public int Left;

    public int Right;

    public Node(int key)

    {

    Key = key;

    Left = 0;

    Right = 0;

    }

    }

    }

  8. JoelMartinez says:

    But what of Matt’s comment:

    "The handle should be made a well defined struct so as to hide the index mechanism in case you need to change it, but in effect it is likely to be an index into your array of all entities)"

    Wouldn’t a structure of value types be able to partake in the benefits just as if it was a simple integer (as long as it doesn’t have any references of course)?

    In my example, the handle struct did have a property which returned the referenced Sprite, but it wasn’t an actual reference, it simply delegated the call to the SpriteManager.

    I have to admit I’m having somewhat of a tough time understanding the why, where, and how to apply this … would love for others to chime in and add to the discussion 🙂

  9. Frank Hileman says:

    You are correct, sorry I did not notice it was a struct and not a class. Matt is correct, if you are to expose the handle to the public it should be wrapped in a struct.

    I find integer index pointer replacement most useful for internal data structures (such as the posted tree classes) where the end user has the same experience, regardless of the internal choice of a pointer or integer to reference something.

    For public API’s, handles are cumbersome. Nearly always you can find an alternative API that does not need to expose them.

  10. JoelMartinez says:

    So would you say that the effort to implement something like this is only worth it if said data structures are relatively short lived?

    I mean, what if you have a List<> of objects that are long lived … would it be worthwhile to do this.  Or does the fact that the list is referenced/traversed somewhat often make the shorter GC time not worth it?

    I suppose thats what Matt was talking about when he said, "Balancing this cost is where the skill of the coder comes in. "  🙂

  11. ricom says:

    Frank and Joel seem to have the right idea when it comes to what I meant about handles.  I wouldn’t jump there too quickly though — in a game context it can be excellent if you have some fairly static structures that you set up for say a game level and then just park for a considerable amount of time.  Making those cheap to collect is a good deal.  Other contexts might also benefit for similar reasons but if you intend to mutate the data very much you’ve basically replaced managed pointers with self-managed pointers disguised as integers.   Not sure that’s really wise.

    Some hybrid is likely the best choice.

    Now on the desktop CLR this is much less interesting because you can just let your objects age into gen2 and they won’t need to be walked very much anyway.

    On .NET CF you might be more excited about this because everything has to be traced every collect.

    Again… do this wisely with data to back you up, not because big-mouth Rico said it might be interesting 🙂

    Wrapping the handles in a struct would be ideal but make sure you aren’t killing your code-gen by doing so.  If you lost some important inlining you might just have to bite the bullet and do it the hard way.

    If .NET CF is in the picture you’re talking about a great deal of portability (awesome) but also several JITs to check out (ouch).  Simpler might be better.

    Lastly, remember I’m not a .NET CF expert, I only proxy for one at game conferences 🙂

  12. Andrew says:

    Great summary of how the GC works! I have a few comments though.

    Rico wrote:

    "People learned best practices for classic memory allocators … you have to wrap them.  And that’s exactly what everyone does."

    If you are wrapping a classic memory manager then it’s trivial to keep track of which line of code made a particular allocation, which is usually all you need to fix a leak. A wrapper also allows you to keep statistics which can detect leaks automatically in debug builds.

    There are also techniques, such as buddy memory allocation, that eliminate external fragmentation without the need for compaction.

    Rico wrote:

    "Garbage Collected memory model is the centerpiece. Amortized cost of allocations in this model can be excellent!"

    What makes realtime software so fun is that it absolutely has to meet it’s realtime deadlines 100% of the time. Worst case, not amortized, is the cost of interest.

    I’ve been writing realtime software for 10 years now and I’d rather fix 100 memory leaks than a single missed realtime deadline problem caused by a non-deterministic thread like a GC!

  13. ricom says:

    All good thoughts from Andrew above.

    Controlling heap size and allocation volume are great ways to control maximum collection costs.

    A theme of the advice is that collections are induced by allocations, so controlling allocations lets you control frequency and cost of collections while keeping a simpler model.

    It isn’t falling off a log but it can (and does) work with some care.

    Life is never easy in the real(time) world.

  14. Matt says:

    Rico,

    Another useful article – thanks for the tips.

    We have a large desktop application under development here that’s exhibiting an ever-growing Gen 2 heap (occasional GCs of small portions of the heap are occurring, but the trend is otherwise inexorably skywards).

    Some people here are arguing that this is not a problem, as the CLR would collect more of the heap if it needed it. The machines running the app are typically very high-spec, with 2GB RAM.

    What say you?

  15. Game development is one of those dark arts where the usual laws of scalability don’t always apply.&amp;nbsp;…

  16. ricom says:

    >>We have a large desktop application under development here that’s exhibiting an ever-growing Gen 2 heap (occasional GCs of small portions of the heap are occurring, but the trend is otherwise inexorably skywards).

    The long and the short of it is that the GC will tend to collect when it seems worthwhile to do so.  So if your death rate in generation 2 is fairly low the CLR will just let it sit because on a percentage basis the amount of waste seems low.  Generally we’re pretty good about getting this right.

    So when are we likely to get this wrong you ask?  Look at http://blogs.msdn.com/ricom/archive/2004/11/29/271829.aspx for information on that.

    Many applications tend to accumulate state.  If that’s a normal thing for your algorithms, then it’s not a problem.  If that’s not normal then maybe you are holding on to objects you could release (leaks) or maybe you have episodic deaths (see link above).

    A common source of leaks is where some data structures get linked to durable objects like event sources and are not unlinked when they die.  The event source lives forever and therefore so do the things that registered to receive the events.

  17. lb says:

    you’re like one of those crazy people you see on the bus who are muttering crazy things to themselves. But in a really good and positive way 😉

    i like it.

  18. ricom says:

    Err.  Thanks.  I think 🙂

  19. It almost seems like it was planned, but at Gamefest Rico Mariani did a talk titled &quot;Taming the CLR:…

  20. Rico posted an article&amp;nbsp;Taming The CLR: How to Write Real-Time Managed Code&amp;nbsp;with the contents…

  21. XNA Diaries says:

    While the slides and recordings from Gamefest are not yet available on the&amp;nbsp;conference website, Rico…

  22. This article&amp;nbsp;has some great info regarding real-time and manage code.

    &amp;nbsp;

  23. The C# team is happy to welcome you to the new look of the C# Developer Center. This is the home page…

  24. The C# team is happy to welcome you to the new look of the C# Developer Center. This is the home page…

  25. The C# team is happy to welcome you to the new look of the C# Developer Center. This is the home page…

  26. The C# team is happy to welcome you to the new look of the C# Developer Center. This is the home page…

  27. In my last quiz I asked a few questions about a few hypothetical classes that might appear in a value-right…

  28. Introduction Now that XNA Game Studio Express 1.0 is out, it’s time to start writing managed code for

  29. In my last quiz I asked a few questions about a few hypothetical classes that might appear in a value-rich

Skip to main content