Using GC Efficiently – Part 1


So the goal here is to explain the cost of things so you can make good decisions in your managed memory usage – it’s not to explain GC itself – it’s to explain how to use it. I assume most of you are more interested in using a garbage collector than implementing one yourself.  It assumes basic understanding of GC. Jeff Richter wrote 2 excellent MSDN articles on GC that I would recommand if you need some background: 1 and 2.


 


First I’ll focus on Wks GC (so all the numbers are for Wks GC). Then I’ll talk about stuff that’s different for Svr GC and when you should use which (sometimes you don’t necessary have the choice and I’ll explain why you don’t).


 


Generations


 


The reason behind having 3 generations is that we expect for a well tuned app, most objects die in Gen0. For example, in a server app, the allocations associated each request should die after the request is finished. And the in flight allocation requests will make into Gen1 and die there. Essentially Gen1 acts as a buffer between yound object areas and long lived object areas. When you look at the number of collections in perfmon, you want to see a low ratio of Gen2 collections over Gen0 collection. The number of Gen1 collections is relatively unimportant. Collecting Gen1 is not much expensive than collecting Gen0.


 


GC segments


 


First let’s see how GC gets memory from the OS. GC reserves memory in segments. Each segment is 16 MB. When the EE (Execution Engine) starts, we reserve the initial GC segments – one for the small object heap and the other for the LOH (large object heap).


 


The memory is committed and decommitted as needed. When we run out of segments we reserve a new one. In each full collection if a segment is not in use it’ll be deleted.


 


The LOH always lives in its own segments – large objects are treated differently from small objects thus they don’t share segments with small objects.


 


Allocation


 


When you allocate on the GC heap, exactly what does it cost? If we don’t need to do a GC, allocation is 1) moving a pointer forward and 2) clearing the memory for the new object. For finalizable objects there’s an extra step of entering them into a list that GC needs to watch.


 


Notice I said “if we don’t need to do a GC” – this means the allocation cost is proportional to the allocation volume. The less you allocate, the less work GC needs to do. If you need 15 bytes, ask for 15 bytes; don’t round it up to 32 bytes or some other bigger chunk like you used to do when you used malloc. There’s a threshold that when exceeded, a GC will be triggered. You want to trigger that as infrequently as you can.


 


Another property of the GC heap that distinguishes itself from the NT heap is that objects allocated together stay together on the GC heap thus preserves the locality.


 


Each object allocated on the GC heap has an 8-byte overhead (sync block + method table pointer).


 


As I mentioned, large objects are treated differently so for large objects generally you want to allocate them in a different pattern. I will talk about this in the large object section.


 


Collection


 


First thing first – when exactly does a collection happen (in other words, when is a GC triggered)? A GC occurs if one of the following 3 conditions happens:


 


1)      Allocation exceeds the Gen0 threshold;


2)      System.GC.Collect is called;


3)      System is in low memory situation;


1) is your typical case. When you allocate enough, you will trigger a GC. Allocations only happen in Gen0. After each GC, Gen0 is empty. New allocations will fill up Gen0 and the next GC will happen, and so on.


 


You can avoid 2) by not calling GC.Collect yourself – if you are writing an app, usually you should never call it yourself. BCL is basically the only place that should call this (in very limited places). When you call it in your app the problem is when it’s called more often than you predicted (which could easily happen) the performance goes down the drain because GCs are triggered ahead of their schedule, which is adjusted for best performance.


 


3) is affected by other processes on the system so you can’t exactly control it except doing your part of being a good citizen in your processes/components.


 


Let’s talk about what this all means to you. First of all, the GC heap is part of your working set. And it consumes private pages. In the ideal situation, objects that get allocated always die in Gen0 (meaning, they almost get collected by a Gen0 collection and there’s never full collection happening) so your GC heap will never grow beyond the Gen0 size. In reality of course that’s almost never the case. So you really want to keep your GC heap size under control.


 


Secondly, you want to keep the time you spend in GC under control. This means 1) fewers GCs and 2) fewer high generation GCs. Collecting a higher generation is more expensive than collecting a lower generation because collecting a higher generation includes collecting objects that live in that generation and the lower generation(s). Either you allocate very temporary objects that die really quickly (mostly in Gen0, and Gen0 collections are cheap) or some really long lived objects that stay in Gen2. For the latter case, the usual scenario is the objects you allocate up front when the program starts – for example, in an ordering system, you allocate memory for the whole catalog and it only dies when your app is terminated.


 


CLRProfiler is an awesome tool to use to look at your GC heap see what’s in there and what’s holding objects alive.


 


How to organize your data


 


1) Value type vs. Reference type


 


As you know value types are allocated on the stack unlike reference types which are allocated on the GC heap. So people ask how you decide when to use value types and when to use reference types. Well, with performance the answer is usually “It depends” and this one is no different (did you actually expect something else? J). Value types don’t trigger GCs but if your value type is boxed often, the boxing operation is more expensive than creating an instance of a reference type to begin with; and when value types are passed as parameters they need to be copied. But then if you have a small member, making it a reference type incurs a pointer size overhead (plus the overhead for the reference type). We’ve seen some internal code where making it inline (ie, as a value type) improved perf as it decreased working set. So it really depends on your types’ usage pattern.


 


2) Reference rich objects


 


If an object is reference rich, it puts pressure on both allocation and collection. Each embedded object incurs 8 bytes overhead. And since allocation cost is proportional to allocation volume the allocation cost is now higher. When collecting, it also takes more time to build up the object graph.


 


As far as this goes, I would just recommand that normally you should just organize your classes according to their logical design. You don’t want to hold other objects alive when you don’t have to. For example, you don’t want to store references of young objects in old objects if you can avoid it.


 


3) Finalizable objects


 


I will cover more details about finalization in its own section but for now, one of most important things to keep in mind is when a finalizable object gets finalized, all the objects it holds on to need to be alive and this drives the cost of GC higher. So you want to isolate your finalizable objects from other objects as much as you can.


 


4) Object locality


 


When you allocate the children of an object, if the children need to have similar life time as their parent they should be allocated at the same time so they will stay together on the GC heap.


 


Large Objects


 


When you ask for an object that’s 85000 bytes or more it will be allocated on the LOH. LOH segments are never compacted – only swept (using a free list). But this is an implementation detail that you should NOT depend on – if you allocate a large object that you expect to not move, you should make sure to pin it.


 


Large objects are only collected with every full collection so they are  expensive to collect. Sometimes you see after a full collection the Gen2 heap size doesn’t change much. That could mean the collection was triggered for the LOH (you can judge by looking at the decrease in the LOH size reported by perfmon).


 


A good practice with large objects is to allocate one and keep reusing it so you don’t incur more full GCs. If say you want a large object that can hold either 100k or 120k, allocate one that’s 120k and reuse that. Allocating many very temporary large objects is a very bad idea ‘cause you’ll be doing full collections all the time.



 


 


That’s all for Part 1. In the future entries I’ll cover things like pinning, finalization, GCHandles, Svr GC and etc. If you have questions about the topics I covered in this entry or would like more info on them feel free to post them.

Comments (69)

  1. Wayne says:

    Good stuff. Keep it coming, please.

  2. SubZero says:

    Wow – I used to get paid (approx. $500 per article) for writing things like this.

  3. Microsoft developer Maoni has some more interesting GC-related information, this time on Using GC Efficiently . While you don’t want to get bogged down in micro-optimization too early, it important to bear stuff like this in mind when designing high…

  4. Vishal Joshi says:

    A very good article… I recommend you to send it to MSDN Knowlege Base…

  5. avnrao says:

    cool stuff.. please continue this blog.

    pls answer these few questions..

    1. how do i find out how many objects exist in Gen 0,1 and 2 separately?

    2. In collection section of your blog..you mentioned that GC fires when any of the three conditions meet. what happens to the objects if none of the conditions meet.

    say.. I call a method, create some object object X and finish the method. say my Gen 0 doesnt reach the threshold (is there any default??), when will GC be collected? until GC starts, will the object be alive even if my application is stopped?

    pls do answer.

    thank you,

    Av.

  6. Maoni says:

    Hi Avnrao, to answer your questions:

    1) You can get the generation sizes from perfmon counters. Please see http://blogs.msdn.com/maoni/archive/2004/06/03/148029.aspx. We don’t expose the number of objects. It’s more useful to know how much space they occupy rather than how many of them there are. Use CLRProfiler if you need to know what the objecs are on the heap.

    2) If no GC happens, your object will just stay alive in the heap. I am not sure what you mean by "my application is stopped" – does that mean your applicatoin’s process is idle or terminated or something else?

  7. Luke S. says:

    That’s an interesting read thanks for posting.

    Though now I can understand more why .NET GC is slow and inferior to native allocations:

    1. you say you are allocating (reserving) 16MB per segment. What does that do to my PTEs? My simple calc says that you would need 4000 PTEs per your GC heap, thus 12000 for all three of them, 48kb of non paged memory? How does that compare to native HelloWorld where all my code might fit in few kbs?

    2. strategy for allocation: if you do not collect in G0 until 16MB isn’t filled, what does that do to my L2 cache? I don’t know about you guys at MS, but my best CPU has 2MB. Does that mean that instead of being local and fitting nice in L2 I am bouncing other data out? I guess that doesn’t help perf there?

    3. what objects go to GC? If I have for (…){String s1;…} does s1 it goes to GC or you keep it on stack? From Rotor I understand it will be put on GC and though compiler has perfect control and info over lifetime of S1? Why not to put in on stack, or at least to move your GC pointer back and forth?

    I wonder is there any a theoretical case where manged code with GC would be anywhere close to well crafted native allocation? Doubt it…

  8. Maoni says:

    Luke, you got this completely wrong. Let me explain:

    1) PTEs are not allocated for reserved memory. And we don’t allocate a segment for each generation.

    2) See 1), we don’t allocate a whole segment for Gen0. Gen0 is a lot smaller than the whole segment.

    3) In your code example, it is clear that the compiler could figure out that the scope and lifetime are the same so the object could be safely be allocated on the stack without any checks. Otherwise, in a normal case, where the object is passed to a method, you couldn’t allocate it on the stack even with checks. So it could be done with a smart compiler and some changes in the EE but it isn’t clear how much it buys you in practice because when the lifetime isn’t clear, the compiler has to heap allocate.

    Finally, do you have evidence that GC is slower than native allocations? Because that’s the opposite of our experience.

  9. Akshay Kumar says:

    Hi Maoni,

    This is excellent stuff . But I have to say this .

    I do not fine relation between this theory with ASP.NET technology and Win forms technology.

    e.g What impact does session management has when done is a certain fashion on GC. b’cos its MS developer who wrote session management or cache management engine.

    So how are objects in session disposed and GC’ed.

    There has to be a relationship established for overall performance increase.

    this comment is not for you but its for people writing scalability guides and MS product documentation( as I have no access to them) may be you can send my gripe to them.

    Once again good post.

    Regards,

    Akshay

  10. Maoni says:

    Hi Akshay, thanks for your feedback. I’ve asked our ASP.NET and WinForms perf people whether they have any documents of such. I’ll post the responses once I have them.

  11. Maarten says:

    Naomi, Could you explain what is in the 8 byte sync block you mention? What is is used for?

    Thank you

  12. Maoni says:

    Maarten, take a look at Jeff Richter’s MSDN article http://msdn.microsoft.com/msdnmag/issues/03/01/NET/

    It explains a sync block use. It’s also used for various other things in the runtime but what Jeff mentioned in his article is one of the most important usages of sync blocks. Basically think of it as some data associated with the object and allocated on demand.

  13. I previously blogged about a set must-read garbage collection articles and issues around directly…

  14. Using GC Efficiently – Part 1

    Using GC Efficiently – Part 2

    Using GC Efficiently – Part 3

    Using…

  15. In Using GC Efficiently – Part 2 I talked about different flavors of GC that exist in the CLR and how…

  16. Spending lots of time on C++ means I haven’t been paying as much attention to managed code as I did in…

  17. Using GC

    Efficiently Part 1Maoni explains the cost of things so you can make good

    decisions in…

  18. In Using GC Efficiently – Part 2 I talked about different flavors of GC that exist in the CLR and how

  19. Certainly that’s one of the most frequently asked questions I get (at the PDC too!). So since PDC already

  20. ramachandran.d says:

    Hi Maoni,

    I read about method calls and JIT in Jeff’s CLR via C# book. There i read the following concepts.

    When a method is about to call, iT finds all the types references/used in the method and creates type objects for those types and loads the MSIL of the methods/types and associate them with the type object.

    When a method is called, it loads the MSIL JITs it and Updates the type object to update the JIT compiled code instead of MSIL code.

    When a non virtual method is called it **finds the type object and finds the address** of the JIT compiled code or MSIL code and do the above things.

    How does it finds the exact address of the type object?

    How this happens if it is a value type?.

    regards

    RAM

  21. Raj says:

    Hi Maoni,

    nice Article on GC.

    I have some doubts in it.

    you said when EE stars it allocate one segment for small objects and one segement for LOH. in which generations will it creates?

    2)some time back i read one article in which he stated that when an application started it will create 2 threads one for main and one for GC in this situation how the memory reclaims for the following scenario.

    I created a reference type object in a method so that the memory is created in Gen0 and the Gen0 is not full suddenly my application terminated. what about the memory allocated?

  22. maoni says:

    Raj,

    Re 1) LOH is always treated as gen2 objects – meaning they are only collected with gen2 colletions. For SOH (Small Object Heap) the initial objects are gen0 then the survivors get promoted to gen1; then you start allocating in gen0 again. The segment often looks like this:

    from lowest address to highest address

    gen2 | gen1 | gen0 | end space of this segment

    Take a look at my slide deck from the last PDC at

    http://216.55.183.13/pdc2005/slides/FUN421_Stephens.ppt

    (note that last time I tried I couldn’t open this in IE but I could save it)

    re 2) regardless of what GC does, all private memory associated with the process will be freed by the OS.

  23. Bill says:

    Maoni,

    Can you recommend a good resource on how to troubleshoot an ASP .Net application that has too many gen2 collections.  I am having a hard time determining why the gen 2 collections are happening.  Our research indicates we have too many segements being allocated or the sytem is in a low memory situation (possibly both).

    Thanks,

    Bill

  24. Bill says:

    Thanks for the troubleshooting article 🙂

    Can you explain in more detail the situation where garbage collection is triggered by:

    System is in low memory

    Do you mean the entire machine is low on memory or just the .net process?  What are the thresholds that must be crossed for this situation to happen??

    Bill

  25. maoni says:

    Bill, we measure the machine memory load, not per process. On different OSs the values are different but you can consider a GC is likely to be triggered if the machine mem load is in the high 80’s. However we don’t do full GCs all the time when the memory load is high – we tend to do a full GC only every few GCs.

  26. Bill says:

    Maoni,

    You mentioned "if you allocate a large object that you expect to not move, you should make sure to pin it".

    In several of my company’s applications developers have loaded a DataSet with static data during the ASP .Net application start.  In your opinion we can help the GC by pinning the DataSet so the GC doesn’t have to "look" at the static data during collections?

    Would this code look like this??

    DataSet ds = new DataSet();

    … code to load ds …

    GCHandle pinHandle = GCHandle.Alloc(ds,GCHandleType.Pinned);

    Thanks,

    Bill

  27. maoni says:

    When I said "if you allocate a large object that you expect to not move, you should make sure to pin it", my intention was to tell you that not compacting LOH is an implementation detail that you should not rely on. So to make sure your large object don’t move you should pin it.

    You shouldn’t pin those objects. Since they are allocated when the process starts up, they will likely be moved into a gen2 segment and stay there anyway. Pinning them would only introduce possible fragmentation and won’t gain you anything (and makes GC’s job harder).

  28. pmackay@hotmail.com says:

    Maoni,

    I checked your FUN421_Stephens.ppt, and in the slide 9, after 500 GC, what could happen with a segment if after the 501th collect, it frees many objects that lived in gen 2?

    ¿Is that segment going to be like this?

    222222222222222222222222222222222222222FFFFFFFFFF

    22FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

    222222221111111110000000000FFFFFFFFFFFFFFFFFFFFFF

    or like this, due to a compact

    222222222222222222222222222222222222222FFFFFFFFFF

    22222222221111111110000000000FFFFFFFFFFFFFFFFFFFF

    Also, Is possible to have something like this?

    222222222222222222222222222222222222222222222FFFF

    2222222222222222222222222222222222111111111111111

    111111111111100000000000FFFFFFFFFFFFFFFFFFFFFFFFF

    Thanks,

    patrick.

  29. maoni says:

    The 1st and the 2nd case you described are possible. The 3rd one is not ’cause there can’t be more than one ephemeral segment for each heap.

  30. As you may know, there are different GC modes to choose from depending on the type of application you’re

  31. shalen520 says:

    本文的目的是讨论GC操作的成本,以便在使用托管内存时能作出正确决策

  32. roy ashbrook says:

    Ah. Garbage Collection… how I love and hate thee. =P I think one sad thing about programming in .net

  33. roy ashbrook says:

    Ah. Garbage Collection… how I love and hate thee. =P I think one sad thing about programming in .net

  34. Andrew Z says:

    Objects moving around in memory? Remind me to stick with native C++!

  35. APatel says:

    Maoni,

    We are executing performance tests on a .NET web application and we are running into few issues. If you can shade some light around some of the GC related issues that will be of great help:

    About the Application:

    1.

    It is a web application with a specialized engine to handle inbound requests (a C# class) with a very simple ASPX page which simply passes the ASP.NET Request object to it.

    2.

    For every request, the Application generates about ~1,000,000 objects (we used Windbg)

    3.

    W3wp.exe memory usage is around 300 MB

    4.

    During each request process, we create many XML and execute at least 4 XSLT transformations (we use managed code – compiled XSLT and not MSXML).

    5.

    Environment: .NET 2.0 with SP1, IIS 6 on Windows Server 2003 R2. The server has single dual-core Xeon and 4 GB of RAM

    Observations:

    1.

    With multi-user load, we see frequent GC Gen 2 collections (around 6 or 7 spikes in % Time spent in GC for Gen 2 every 100 seconds).

    2.

    The LOH size keeps seesawing (it goes down when Gen 2 GC runs)

    3.

    CPU usage is also very high (90% +) throughout the load test.

    Questions:

    1.

    I would like to find out GC collection trigger point logic (i.e. number of objects, memory allocation etc.). From your blog entry, I understand that when allocation exceeds Gen 0 threshold, GC gets invoked. What is this “Gen 0 Allocation Threshold”? Are there other such trigger points? We are not invoking GC from the app code.

    2.

    We have nothing else running on the server and w3wp.exe is taking around 300-400MB of RAM (observed trough the task manager and Perfmon). Why is it not using the remaining available RAM on the system? Are there memory settings for CLR that we can tweak?

    3.

    Are there any tools that you would suggest for memory profiling? We are using couple of tools including VSTS 2008 but they take long time to collect information (few hours at least). We want to pinpoint the “culprits” which cause allocations in LOH through call stack.

    Please let me know if you need more information.

    Thanks

    APatel

  36. Martin Kulov says:

    Rico,

    do you have any idea why the two msdn magazine articles that you are refering in the first paragraph and no longer available.

    I also found them pretty usefull resource, but now they can not be reached online.

    Why it is deleted?

  37. I am looking for any available information on how the "Budgets" are calculated. I am seeing some anomolies in client code where I believe the budget is getting lower and lower, but can not resolve the potential cause….

    Any guidance greadly appreciated…

  38. tmfc says:

    翻译自Maoni’s WebLog 文章Using GC Efficiently – Part 1,Maoni是微软CLR Performance组的成员 本文的目标是解释一些东西的代价好让你可以更好使用托管内存-而不是解释GC本身-只是解释如何使用它而已。我假设绝大多数人对于使用垃圾收集感兴趣,而不想自己实现一个。本文假设读者对GC有基础的了解,如果你需要一些关于GC的背景知识,Jeff Richter写了两篇非常好的MSDN文章,1和2。 首先我会关注工作站级类型的GC(Wks GC),然后我会解释服务器类型的GC(Svr

  39. csjasnoch says:

    Hi Maoni,

    I have been digging around and I am trying to find some info on G2 collection. Is it hindered even if the G2 objects are not dead?

    We have an app that has around 35mil objects. This is a very OO app that has been built to do image processing and is modifiable on the fly. So reducing these amounts is not likely.

    However, the app was set up to use a buffer and recycle the objects. The app crashes when GC begins though. I am wondering is GC working so hard because of other G2 objects that are dieing or is it more likely the 35mil references (not dieing) that are doing it?

  40. maoni says:

    csjasnoch, observing a crash while a GC is in process most likely points to a managed heap corruption. I would suggest you to contact our PSS folks if you need help debugging this.

  41. 飞天舞者 says:

    推荐:CLR 完全介绍-一篇讲解CLR内存回收机制以及常见的调试技巧的文章

  42. Note : This entry was originally posted on 9/14/2008 5:16:11 PM. I present at a lot of the local Florida

  43. There are allots of improvements that can be done within application’s configuration file the only catch

  44. 两种GC,做web的需要哪种?经常看到一些asp.net的项目没有根据生产环境选用合适的GC,我甚至都亲眼见过asp.net使用gcConcurency设置为true的场景!

  45. Some common causes of OutOfMemoryExceptions in ASP.NET applications and information on how to resolve these exceptions.

  46. SHREK says:

    Maoni,

    Thanks a lot for your blog! I’ve rereaded it again and I wanna say: THANKS! I undestand GC much better due to you!

  47. paul says:

    The 2nd msdn link is no longer valid. I think this is the correct one though: msdn.microsoft.com/…/bb985011.aspx