Mid-life crisis


This particular problem (I call it mid-life-crisis) seems to come up fairly often so I thought I’d write up some general advice on it.  The symptoms go something like this:  There is a server process (usually a web server) and that process has a high percentage time spend in the garbage collector, like say 30%, or even more.  Simple enough, but why would the time be so high?  Why isn’t it more like the 1% or so that we’d like it to be?



Often (not always) the answer is mid-life-crisis.  By this I mean that something has happened that is causing objects that are normally of middle lifetime length (which we would very much like to die in generation 1) are living longer and end up dying in generation 2.


This is very bad.


Generation 2 garbage collections are the full ones.  That means every object on the heap must be visited and the process is largely stopped while this is going on.  If you are getting a lot of objects promoted all the way from generation 0 to generation 2 and then having them die shortly thereafter you are paying a huge price to clean up those objects. 


Why does this happen?


Well in the server case there’s one very common reason.  Let’s say it’s a web server, a request comes in, a bunch of setup work is done to get the result for that request, and at some point the code then accesses a database or a web-service to get the necessary data.  At that point the thread accessing the data is blocked, but all the objects that were pending for that request are still live.  Meanwhile, other threads on the server are still running, still doing allocations, and those might end up requiring a garbage collection.  When that collection happens, all the temporary objects on the blocked threads are still live, probably in local variables or objects that represent the transaction in flight.  They survive the collection and are promoted. 


Now since transactions are often longish things and collections are going to happen at some point, it’s normal for some objects associated with the transaction in flight to survive the generation 0 collections that are hopefully happening every second or so.  Those objects are going to get promoted to generation 1 just like they should, in fact, the main purpose of the generation 1 group of objects is to live long enough for a transaction related objects to stick around and then die cheaply.


But here’s where things go wrong.  If there are fairly long delays waiting for say database results, and a fairly large number of objects representing the state of the transaction in flight, there will be enough buildup in generation 1 that it will become appropriate to try to collect those objects.  At that point the survivors will be promoted to generation 2.  If there are a lot of survivors we are now in trouble because in order to clean them up we will have to do a full collection.  If those are happening regularly, the percent time spent in the collector will shoot up from a healthy 1% to something very bad, like 30%, 50%, even more sometimes.


So what to do about this?


Well, the good news is there’s a fairly straightforward line of defense.  The trick is, that you must clean up (i.e. set references to null) as much of your state as possible before you block on something like a database, or really before you block on anything that might be long.  It’s often the case that a lot of the temporary data won’t be needed after the database results come back, or could be cheaply recreated.  Before you call your web-service or database backend, get rid of as much as you can so that the objects that will survive collections while you are blocked are minimized.  This will let more things die in generation 0, minimize additions to generation 1, and avoid the crisis your mid-life generation 1 objects will cause should they start surviving into generation 2.


Remember, the “age” of objects is a relative thing. Collections cause things to age, and allocations are what cause collections, so reducing the total number of allocations causes things to age more slowly.  Having your objects die as quickly as possible again reduces the pressure to grow the generations and hence keeps things younger.


To see if this sort of thing is happening to you, you can look at the Relocated Types view in CLR Profiler to see what’s getting moved around (remember things are normally moved when they are promoted so moving objects are a good proxy for promoted objects).  To get overall promotion rates, use the GC Performance counters, there are counters that will tell you how much stuff is getting promoted into generation 2.  You want that number to be as small as possible – zero is ideal and even achievable in steady state, but as long as the rate of generation 2 collects is staying low, you’ll be fine.


Summary:  Don’t have a mid-life-crisis.  When there’s are many threads be sure to release as many of your objects as possible before you block any thread.

Comments (35)

  1. Ian Griffiths says:

    I’m surprised to learn that setting variables to null helps so much. My experiments seem to suggest that the garbage collector is doing liveness analysis (or is taking advantage of the liveness analysis performed by the JIT). So with something like this:

    MyObj o = new MyObj();

    o.DoSomethingWithO();

    BlockForAges();

    return;

    the GC usually seems to work out that ‘o’ isn’t used after that second line, and will happily collect the object (or at least it’ll run its Finalize if I put a finalizer in there as an experiment to see what the GC is doing) if a GC occurrs during BlockForAges. Setting o to null doesn’t seem to make any difference, since the liveness analysis has already worked out that o is effectively out of scope.

    Are there situations in which this liveness analysis is defeated? Or is this not the scenario you’re thinking of when you say to set things to null?

  2. Rico Mariani says:

    You’re absolutely right, if the JIT can statically determine that the variable is dead when the code is generated then there’s no need null things out. However, what often happens in servers is that there are certain transaction state variables that are, for instance, on the "this" pointer which are still reachable. Those are the ones to null if you can.

    For instance, suppose you got your input in XML format and you had a series of functions to extract what you need to do the query out of the XML, building up a SQL query string as you go. Just before you make the SQL query, it would be good to release all the stuff related to the XML that you can so that it can be collected. Objects like that are often fields accessable via your "this" pointer rather than local variables, so they are reachable until the object holding them goes away.

    Other times there are helper collection classes that assist in the parsing and validation of the inputs. These objects also need to survive across several function calls (they are often accumulating results as process goes along), again these would seem live to the collector but perhaps they can be nulled or emptied.

  3. Rico Mariani says:

    Let me alter your example just a tad, this is a stupid example using intermediate strings just to illustrate

    class MyObj

    {

    private String s1;

    private String s2;

    private String s3;

    public void DoSomethingFromInputs(String s)

    {



    s1 = AnElegantOperation(s);



    }

    public void ComputeInterimResults(String options)

    {



    s2 = SomethingEvenMoreElegant(s1, options);



    }

    public void ComputerFinalQuery(String database)

    {



    s3 = SQLFormatting(s2, database);



    }

    public String GetResults()

    {



    s1 = null; // this is what I’m talking about

    s2 = null; // this is what I’m talking about

    // this blocks a long time

    String r = GetDataFromDatabase(s3)



    return r;

    }

    }

    MyObj o = new MyObj();

    o.DoSomethingFromInputs(s);

    o.ComputeInterimResults(options);

    o.ComputeFinalQuery(databasename);

    return o.GetResults();

    (please forgive my syntax, I hope you can get the jist)

  4. Rico, thanks for clearing that up – I also shared Ken’s concerns when reading your original post.

  5. People often ask me for tips/tricks on how to find out which objects are not being properly disposed.  But…

  6. Game development is one of those dark arts where the usual laws of scalability don’t always apply. …

  7. It was really exciting to see that so many people answered the .NET GC PopQuiz , especially seeing that

  8. roy ashbrook says:

    Ah. Garbage Collection… how I love and hate thee. =P I think one sad thing about programming in .net

  9. roy ashbrook says:

    Ah. Garbage Collection… how I love and hate thee. =P I think one sad thing about programming in .net

  10. roy ashbrook says:

    Ah. Garbage Collection… how I love and hate thee. =P I think one sad thing about programming in .net

  11. After going this week to the Microsoft performance open house , here are few things to consider: Create

  12. C# Nuggets says:

    Back in 2000 when the CLR was first shown it's generational garbage collector was fairly cutting