I wish there was an easy answer to this question, but there isn’t. Well maybe there is but I don’t know it J So once again I’m writing a blog full of advice you have to take with a grain of salt. I’m starting to feel like the Charlie Brown of performance advice. Well, wishy-washy or no I’ll give what advice I can.
With the exception of certain built in classes (such as database connections and threads) I think it’s safe to say that most people begin their design on the assumption that they won’t need to recycle their objects. So they plan to just let things naturally fall out of scope or null references to their objects as needed to force the issue. I think that’s actually the right starting point. But sometimes things go not so well from there.
It’s when the garbage collector just doesn’t seem to be doing the job that people start reaching for other solutions like recycling. And you can tell things aren’t going well just by looking at the key garbage collector performance counters. The big one being the percent time spent in the GC. I like to see that number in the low to mid single digits. If you’re seeing a number bigger than that – especially crazy high numbers like say 60% — you can be pretty darn sure that you’ve got Mid Life Crisis (see http://weblogs.asp.net/ricom/archive/2003/12/04/41281.aspx).
I can say that with some confidence because the partial collections, Generation 0 and Generation 1 are really comparatively cheap, and if that’s all that’s happening it’s very hard to get the percent time in the Collector very high. So if the number is high it’s pretty much a sure thing that the collector is working overtime doing generation two collects. Again the performance counters will help you here. Something you should be seeing is that there is a lot of memory being promoted from generation 1 to generation 2. If there wasn’t, well the generation 2 collects shouldn’t be happening. (Note: if there isn’t a lot of memory being promoted you might look for a silly person that is calling GC.Collect(2) and ruining your life)
Mid-life crisis means that you are having your middle-lived objects live long enough to get into Generation 2 but then they die there fairly soon, and then probably do it all over again. That’s a steady stream of attention-needing-junk getting all the way into the expensive generation.
The first thing you have to do is find out (using the CLR profiler) which objects are getting into generation 2 (look at the histogram of relocated objects for a strong hint, when objects are promoted they almost always move and so announce their presence)
Then you have to stop them from getting into generation 2 and dying right away and there’s only two ways to do that
Option One: Have your objects die faster
This is really the preferred way if you can make it happen. That’s what I talked about in the mid-life crisis blog entry a while ago (see link above) so I won’t go into it too much. Today we’re going to talk about…
Option Two: Have your objects live longer
Remember the problem was that the objects were getting into generation two and then dying, over and over and over. If they didn’t die they wouldn’t hurt so much. It’s those mid-life ones that are annoying. This is where the recycling part comes in.
If you understand the usage patterns of your program well (say it’s a web server) and you can readily predict that the objects you are about to release are going to be needed again in about 2 milliseconds (or something) when the next request comes in, there isn’t a whole lot of point in freeing them. You might just as well hang on to those objects, maybe put them in a pool somewhere, and when that request does come along, you can get them out of the pool.
Sounds great… but there’s a catch. There’s always a catch. Or two.
First, once you decide to have a pool, well, let’s call it a cache, because that’s sort of what it is – only more in the literal meaning of the word – you have to manage it. Remember caching implies policy (see http://weblogs.asp.net/ricom/archive/2004/01/19/60280.aspx). So what will the policy on the pool be? More specifically, when do you trim the contents of the pool? Never? Ooops that sounds bad, you could after all go idle for a long period of time and you’d want those resources to be reclaimed. A variety of policies are possible (an easy one is “toss the whole pool every so many seconds regardless of how big it is or isn’t” that one has certain advantages). Well whatever your choice, you have some thinking on your hands or else you might end up with a nasty memory leak in your pool.
Second, your pooled objects might be like your brother’s hand-me-down-shoes. Sure they’ll do the job but they aren’t always ideal. Depending on the nature of the objects that are being pooled you might find that they are either too big or too small for the job. Suppose you’re pooling objects with memory buffers in them. Well the buffer as initially created might be a lot bigger than is usually necessary and now you have this big buffer lying around that really could be a lot smaller if you just threw it away. Or the buffer might be too small for the current request and so it needs to be grown to be used. You might very well end up with all your buffers being bigger than they needed to be because the pooled objects have converged to the maximum required size after a few minutes of running. You might need another sort of policy that goes and tunes up the pool from time to time. But beware, if you create a lot of garbage objects in doing the cleanup, you will be right back to your original mid-life-crisis.
What about raw speed? Well, if you were working the GC overtime before and you stopped that, chances are you’ll be way ahead. But you might be doing better still if you could find a way to go with Option One. I really like Option One.
So, recycling can work out. But it requires a lot more thought than the most basic approach. Do not neglect your policy.