How to evaluate info you read on garbage collectors


Just a word before I actually start this blog entry – I apologize for approving some of the comments so late – it appears that our blogs’ policy has changed and it would make some comments as pending without obvious reasons to me.

Also as one of the ways to support the community I was thinking I could have a specific time on a regular cadence (could be an hour every month to start with) when I would answer questions on the .NET GC. I'm thinking perhaps gitter is the best forum but I am not sure - I don't know how popular it is with .NET developers (that care about memory performance) so if you have any suggestions please post them in comments.

From time to time I get someone that tells me “Hey Maoni, I just read this GC article that says their GC is so great! I think it’s better than your GC!”. Then I read the article and found out that it said nothing that actually indicated their GC is better than ours, in fact, it’s very often the case it didn't even say nearly enough to make an assessment on that GC’s performance, much less to do a comparison with another GC. So I thought I’d write something that helps to explain how to evaluate information you read on GCs.

First of all, comparing GC performance is definitely non trivial work. When you run even a micro benchmark, it can easily involve enough from the non GC part of a runtime. So you are actually not comparing just the GCs unless you isolate it down to the parts that’s related to GC perf. I remember one time we had someone who showed us a comparison on a benchmark between our runtime and another runtime and it was spending much more time in GC in our runtime. It turned out it was due to our codegen making a decision that caused us to survive a lot more in our runtime so GC had to work a lot harder.

When you compare something that’s more than a micro benchmark, eg, a web server macro benchmark we are just talking comparing frameworks at this point, it’s a long way from drawing any conclusion on the GC performance alone. One framework could have a completely different allocation and survival pattern from the other which means one can impose a very different amount of work on GC from the other. For example, one framework could choose to allocate much fewer objects or tend to survive much fewer because of the programming patterns in that framework so it simply doesn’t need a stellar GC to manage memory.

Even when you have isolated it to a point that you are mostly comparing the GC performance, as anyone who’s even slightly familiar with GCs, you know there are tons of tradeoffs to make in a GC. Without actually understand the architecture, the mechanisms and the policy of a GC, it’s far from complete to run a few tests to determine what a GC’s performance is. For example, one GC could have made a choice to optimize for generational behavior while another may not have (could be because the framework simply does not exhibit generational behavior).

When you understand the architecture of a GC you know its limits so you can make a statement about what it’s capable of in theory. Of course 2 GCs with the same architecture can still have dramatically different performance on the same exact test (let’s say you’ve isolated enough to make such a test) because how optimized the implementation is – on the hot path, one could be using a more clever way to implement something while the other falls short.

At this point I hope you realize how careful you need to be when you are not doing experiments yourself and simply reading some information written on a GC.

If a GC article explains how it does something (like an architectural doc, or a blog entry that explains how the GC handles a specific scenario), it’s totally fine to read the description. But without some knowledge on how effective/innovative/sophisticated this allows a GC to be, it’s impossible to say whether you should believe the author’s claim in how good the GC is, especially when you want to do a comparison. I’ve seen GC articles that made the most mundane stuff sound like it’s something that sets their GC apart from others GCs, when in reality it’s something that most other GCs of the same genre do or did at some point and already improved upon.

If a GC article makes a claim about the performance, this is where you really need to be careful when choosing whether you should believe the claim or not. A good article would tell you the workload, the machine config and the GC config that were used to measure the performance, and relevant performance data. If the workload is simply enough (eg, it doesn’t use layers of libraries) or you have enough understanding about the workload to know how much work GC is doing, you can get an idea about GC perf in that framework. And if you port this workload to a different framework and isolate it enough to compare just the GC performance, you can then make a statement about GC performance comparison.

On the other hand, an article that simply makes claims like "our GC’s latency is less than 20 milliseconds" without describing what was run to get that number is meaningless. It could simply be a false statement (as in, you could write a test that simply makes that GC have longer pauses); or it could be the case that the framework the GC lives in simply never (or in 99% of the case) needs to handle situations that would make it have pauses more than 20ms. Without knowing which case it is you cannot make any decision about this GC which in turn means you can’t compare it with another GC.

And even when you are reading numbers printed out by some GCs you need to be careful how you interpret those numbers. Some GCs would indicate which numbers are for STW (Stop The World). Obviously during STW, all your threads are stopped. But STW is not the only way your threads will get interrupted by a GC. Some GCs choose to aggressively do mutator assist (meaning the user threads will take on some GC work before it actually gets to allocate) so while a thread is doing the mutator assist work it’s the same as being stopped. I’ve seen some very long mutator assistant pauses in some GCs that are much longer than their STW pauses. Obviously in that case if you are counting on that thread to finish a request that request’s going to take a long time.

In order to accurately assess latency it’s best to measure it yourself. In our GC micro benchmarks we would measure around a request (which usually does one or more allocations and some assignments) for latency. Since it’s a GC benchmark, usually the latency is due to GC pauses but sometimes it may also be due to other factors like the thread isn’t getting scheduled.

Some of our first party customers implement a diagnostics pipeline that measures the latency of each request and for long requests they can attribute the latency to specific tasks that happened during that request – network IO, disk IO, GC or something else. Depending on what OS/framework you work with, you might have the option to implement this too.


Comments (10)

  1. Alois Kraus says:

    Hi Maoni, that would certainly be interesting. Gitter I think would be ok. For high perf code which creates (nearly) no garbage objects it would be great to turn GC off to e.g. deserialize a large file without GC interferences trying to clean up temp objects which are not there and moving all to Gen2 anyway. GC.TryStartNoGCRegion seems to be the right Api but when testing in such scenarios it was not giving me the perf boost I was after.

    1. Thank you Alois. Please elaborate on how no GC region isn’t working for you right now so I can comment on how/whether it can be improved. Are you getting an error, are you still seeing GCs happening, or is it something else?

      1. Alois Kraus says:

        I did try to deserialize a ca. 200 MB Json file several times in a loop. When I did it once it was faster but when the region has ended I needed to force a GC because when I then deserialize the next file GC.TryStartNoGCRegion would throw because it would tell me that not enough free memory is available or it would deadlock (that is intended behavior I guess for allocating too much). That does not really fit well with an application where other threads can also allocate memory and you need to be aware of the allocation history to take advantage of GC.TryStartNoGCRegion. Ideally I want to tell the GC that all objects allocated on that thread will stay and if it needs more memory it should not try to do a full blocking GC but instead allocate a new GC segment.

        1. Matt Warren says:

          @Alois with regards to the deadlocks you are seeing with using TryStartNoGCRegion(), it might be related to this issue https://github.com/dotnet/coreclr/issues/6809

          1. apparently Matt Warren replied before me but his comments went to “pending” so I just saw them (and approved).

        2. What you described sounds like a bug. This is not how it’s intended. Are you using Server GC? Perhaps you are hitting this issue: https://github.com/dotnet/coreclr/issues/6809. If you have a small repro I can certainly take a look.

  2. Ryan E says:

    I love the idea of a monthly Q&A! Instead of Gitter, I’d suggest using the .NET Slack team!  A bunch of people from the community are active there, and there’s already a #gc channel.

    1. Hi Ryan, wow, there’s actually already a #gc channel… cool, thanks!

  3. onurg says:

    I would be delighted if we get a more direct help from you as you suggested. 10 years ago we could have lived by ignoring the existence of gc, but these days, the data is getting bigger and larger. Thus knowing how gc works and tuning it is more important than ever. I also wish in the future there happens to be more fine tuning options and API for us to use.

    By the way there’s an official car forum of MSDN: https://social.msdn.microsoft.com/Forums/vstudio/en-US/home?forum=clr
    Perhaps you can request a gc forum as well

  4. Matt Warren says:

    Thanks for this post, I try to blog accurately about the .NET GC (e.g. http://mattwarren.org/2017/01/13/Analysing-Pause-times-in-the-.NET-GC/), but I’m sure that I’ve fallen prey to many of the points you raise!!

    > Also as one of the ways to support the community I was thinking I could have a specific time on a regular cadence (could be an hour every month to start with) when I would answer questions on the .NET GC

    I’d be interested to know your thoughts on how the .NET GC compares to the the Go (golang) GC and in particular the trade-offs it’s making, i.e. ‘Prioritizing low latency and simplicity’ from https://blog.golang.org/go15gc and a critique ‘Modern garbage collection’ from https://blog.plan99.net/modern-garbage-collection-911ef4f8bd8e.

    I don’t mean from a performance point-of-view (I’ve read your post and I understand how hard this is!), I mean from the point-of-view of the design goals and trade-offs you considered in it’s design and how they compare to other GCs.

Skip to main content