Gen2 free list changes in CLR 4.6 GC


I wanted to mention this because I started seeing posts about it. In 4.6 we improved the way we use the gen2 free list to promote gen1 survivors into. Unfortunately there was a perf bug that I didn’t notice till it was fairly late so it wasn’t approved to be checked into 4.6 at the time. Now that I am seeing more people hitting it, I have checked the fix into the next hotfix rollup for 4.6 which you will be able to download when it’s released in the near future (I will update the exact location when it’s available).

The symptom for this bug is that you are seeing GC taking a lot longer than 4.5 – if you collect ETW events this will show that gen1 GCs are taking a lot longer.

Note that this is NOT due to any deficiency in background GC (I’ve seen some posts saying that background GC is at fault – this is not the case), it’s simply because of the bug in the free list management. You may see that the bug manifests itself a lot less when you disable Concurrent GC but that’s only because background GC does not compact, by design, so it uses the free list a lot more. Some people also observed turning on Server GC makes the problem “go away” but again, that’s just the way it manifests itself in Server GC (by turning on Server GC you just made GC throughput a lot higher so even though there’s more work to do the total GC pause is lower as work is now done by multiple GC threads).

What makes this bug show up is when the objects that die are small ones that scattered around the heap. This is not exactly very easy to debug so unless you want to put in a bunch of effort (to debug and potentially to change your code) I would recommend to wait for the hotfix rollup.


Comments (40)

  1. Kevin Watkins says:

    Thanks for this post (and for all your work to make .NET such a great environment). We'll look forward to the hotfix.

  2. David says:

    Thanks for the information! We hit this bug in a WinForms app, when adding ~100,000 TreeNodes to TreeVie - the TreeView::CustomDraw() call allocated tons of little objects.

  3. Onur says:

    Hey Maoni, I have few questions about GC that it doesn't seem to free all memory in Server Mode even when system is under memory pressure. Once Gen0 is gone something like 1 GB, CLR is reluctant the give it back and prefers to keep it empty. I have some sample code and windbg analysis I can send it to you. Is this the expected behavior ?

  4. adante says:

    Thanks for the information. Would you mind sharing links to the other posts about it? We were quite badly burned by this bug and I am quite annoyed at ourselves regarding how long it took for us to diagnose. I am interested in hearing how other people managed to investigate this issue.

  5. Michael Covelli says:

    Production impact was severe for me also.  Performance of some scheduled jobs decreased by a factor of 120x (1 minute to 2 hours).  Here's a link to my scenario for it from Stack Overflow: stackoverflow.com/.../garbage-collection-and-parallel-foreach-issue-after-vs2015-upgrade .  I mistakenly thought it was related to Parallel.ForEach because that's the only time I was hitting it.  Here's a link to someone else who hit it: connect.microsoft.com/.../1594775  in a different context.  I think the most surprising part was the we didn't even upgrade to .NET 4.6 but had this issue anyways.

  6. Kevin Watkins says:

    We had a 20x slowdown on some applications (this was in testing, fortunately we were quite cautious about installing 4.6 on any production machines). It was just trial and error figuring out that the GC was somehow involved (because the GC settings made a strong difference).

  7. adante says:

    Am I correct in thinking this problem was potentially present since the release of .NET 4.6? If so it answers a great deal of questions.

    Also our current priority is in resolving this for users. We are doing a code review to attempt to limit the interleaving of long/short life allocations (but the code it is quite complex) and eagerly await a hotfix, but at the moment we are considering disabling gcConcurrent because while this is a kludge-fix, 'it works' (at least for our test environment). I know that the proper answer to the nebulous question I'm about to ask is 'it depends', but can disabling gcConcurrent potentially burn us in others ways?

  8. I was told the hotfix will be released soon (unfortunately by design I cannot tell you the exact date, but as soon as it's released I will post the link here). Please check back in a few days - thank you all for being patient.

    Now to answer questions above:

    @Onur, there are 2 aspects to this:

    1) if most of the memory is taken up in gen0 - we give memory back but we will keep as much committed as gen0 needs before the next GC is triggered (it doesn't make sense to decommit the memory region that will be committed right away) and if pinning is involved, GC will try very hard to keep pinning in control, but we cannot move pins so if they "stretch the heap out", we can only retract the heap back to be the last pin. So checking for pinning is useful.

    2) if most of the memory is taken up in gen2 - are you doing any gen2 GCs? you should be able to see this by taking an ETW trace with perfview and what GCs you are doing.

    @adante, yes this problem was new in 4.6 (since the gen2 free list change I described was added in 4.6). If you are still ok with the gen2 latency you are getting with concurrent disabled, you can safely leave it disabled - that's what concurrent is supposed to be for (reducing blocking gen2 GC's latency). As far as how to diagnose - Michael Covelli (who's also on this commenting thread) was the one who posted the stack overflow post. I dunno how he did his diagnosis (you can certainly ask him :-)) but my understanding is he just noticed that GC was performing poorly (and the only other post on that stack overflow had the wrong diagnosis). I don't expect anyone to get to the root cause (that it's the gen2 free list problem) since this is an internal implementation detail. As I mentioned in this blog entry, a good way to confirm that you are hitting the bug I am talking about is to look at gen1 collection time.

    @Michael Covelli, the VS version you upgraded to uses .net 4.6.

  9. Onur says:

    @Maoni: There  is no explicit pinning in my code. I am just filling some System.Collections.Generic.List<byte> gradually.

    The memory is retained in Gen0. I would understand GC keeping the memory for future use. Especially in server mode. However, even if another process fills entire remaining memory, CLR seems a bit stingy. That's not so good behavior IMHO. Considering in service mode you have many heaps, the end result is all your physical memory is occupied by some processes and a significant portion of it is sitting idle in Gen0. Please see my colleague's post here: stackoverflow.com/.../net-memory-is-not-reclaimed

  10. @Onur, based on that post there is clearly pinning:

    Fragmented blocks larger than 0.5 MB:

               Addr     Size      Followed by

    0000005b1c3c8800  425.6MB 0000005b36d56c50 System.Func`2[[System.IAsyncResult, mscorlib],[System.Net.HttpListenerContext, System]]

    0000005c1c3fcf00   28.2MB 0000005c1e02be58 System.Reflection.Emit.MethodBuilder

    0000005c1e02e5b0  268.0MB 0000005c2ec313d8 System.Threading.OverlappedData

    0000005c2ec31678  142.3MB 0000005c37a830a0 System.Func`2[[System.IAsyncResult, mscorlib],[System.Net.HttpListenerContext, System]]

    0000005d1c541aa0  404.8MB 0000005d35a1a958 System.Func`2[[System.IAsyncResult, mscorlib],[System.Net.HttpListenerContext, System]]

    0000005d35a1e5a8    3.8MB 0000005d35df0c40 System.Func`2[[System.IAsyncResult, mscorlib],[System.Net.HttpListenerContext, System]]

    0000005e1cfe7128  444.1MB 0000005e38c0c3f0 System.Byte[]

    These all look like pinned by IO. Since they are pinned, we cannot retract the heap back. So you see these huge free objects.

  11. koruyucu says:

    @Maoni: And what happens if an array it was pinned (in gen0), then gc executed and then array was 'freed' ? On next gc is that memory still considered unmovable?

  12. @koruyucu, I am not completely sure what you meant to ask so here's the possible scenarios:

    pinning is considered a strong root, so if an array is pinned till next GC happens, it means it will not be freed.

    if something is pinned but unpinned before the next GC happens (on the generation this object is in), GC wouldn't know about it and if there's no other strong roots keeping it live, it will be freed.

    if GC sees something pinned, it will not move it.

  13. Onur says:

    Thanks for the comments Maoni. Also your videos on channel 9 made pinning concept more clear for us.

  14. David Marcus says:

    I'm appalled and disappointed that the fix to this critical bug has not been released (in a hot-fix).

    What is Microsoft waiting? ... for its good will with us developers to fully erode away?

    -David

  15. Ryan Rastedt says:

    Echoing the concerns mentioned previously, we can't even continue evaluating 4.6 for production use until this is addressed (performance is terribly degraded).  Is there any official channel tracking this issue?  It's great that you brought this problem to light, but it is disappointing that it had to be motivated by "people are hitting a known, critical issue that Microsoft has not advertised as a risk."  

    Even with the earlier RyuJIT issues Microsoft insisted 4.6 was production ready...  this seems extremely dishonest to me and has eroded my confidence significantly.  What other critical issues are being sat on?  Do we just need to find someone with more pull in the community to raise the alarm for this problem before it will be formally addresses?

  16. Nick Lowe says:

    Any update on when a hotfix to address this will be available?

    This ought to be distributed via Windows Update in short order to fix deployments of .NET 4.6.

    It is a critical issue for many so this should really have got more attention than it has. Due to it being known about prior to release, it honestly should have been considered a release blocking issue. That is disappointing from a QA perspective.

    I would have reasonably expected to have seen resolution by now but this has not been the case. That's also disappointing. It just seems, prima facie, dysfunctional.

  17. Lee Coward says:

    Hi folks,

    Thanks for all the input on this issue. We certainly understand the frustration and take high impacting situations like these very seriously.

    Maoni’s fix is making its way through testing and the hotfix rollup is expected to be available around 9/22. We had hoped to have the fix out this week but unexpected infrastructure issues led to really unfortunate delays. I’ll post an update here when the patches are published or if the schedule is impacted.

  18. Dave Coates says:

    Thanks for the update!

    I look forward to receiving the fix..

    Dave

  19. Dave Coates says:

    How will this update be published?  I am eagerly waiting for the update to appease my upset customers. This is affecting their production and all the research and testing points to this particular issue.

    Dave

  20. Jake says:

    We're also eagerly waiting for this update, some of our gen1 gc's are taking anywhere from 10-50 seconds to run.  Lots of upset customers expecting this hotfix.  Please keep us updated on when we can expect this to be released.

  21. Does the Hotfix rollup 3088957 include the fix for the issue?

  22. Michael Covelli says:

    Looks like it.  Applying support.microsoft.com/.../3088957 totally fixes all of my issues.  They talk about GCs during Parallel.ForEach in the write-up and I assume that they're referring to this (though that was just one way to demonstrate the issue).

  23. David Pecora says:

    The hotfix request page for KB3088957 states that the hotfix applies to the ".NET Framework 4.6 Preview" and does not apply to newer operating systems (anything after Win7 or Server 2008 R2).  I was dubious about the preview label, but decided to give installing the hotfix a try anyway.  However, I was not able to install it on my Win10 box because it says the OS is not supported.

    While as a software developer myself I understand that there can be unforeseen complexities, it's frankly embarrassing that the fix for a critical problem like this - a fix which was committed a month and a half ago - is still not available.  I'd also like to echo the previous comment that 4.6 should not have been released before resolving known showstoppers like this bug and the RyuJIT bugs.

  24. JohnW says:

    Here's the hotfix for Windows 8.1/Windows Server 2012 R2: support.microsoft.com/.../3088956

    That said, I can't find the corresponding hotfix for .NET 4.6 on Win 10.

  25. David Pecora says:

    Thanks!  Unfortunately this still didn't take.

    However, for those of you who are running Windows 8/Windows Server 2012: support.microsoft.com/.../3088955

    I searched for 20 numbers in both directions in the KB and found no other 4.6 hotfixes.

    To summarize, it looks like we have:

    - For Windows Vista, Windows 7, Windows Server 2008, and Windows Server 2008 R2:  3088957

    - For Windows 8 and Windows Server 2012:  3088955

    - For Windows 8.1 and Windows Server 2012 R2:  3088956

    - For Windows 10:  No hotfix available

  26. Ryan Rastedt says:

    I can confirm that 3088957 on Win 7 corrected the severe performance degradation we had observed.  Fortunately, I have the luxury in my current capacity to only write proprietary software with full control over the hardware and operating systems it runs on.  I feel greatly for everyone with commercial applications that have no control over which runtime customers are using.

    Is there any timeline on getting an update pushed over windows update?

  27. Bill Menees says:

    The "Cumulative Update for Windows 10 for x64-based Systems (KB3093266)" seems to include this hotfix.  The documentation (support.microsoft.com/.../3093266) doesn't explicitly state that, but it updates mscorlib.dll to 4.6.106.0 from 22-Sep-15 1:33 (UTC).  Installing that update fixed the GC slowdown on my Win10 machine.

  28. Patrick Smacchia says:

    Just downloaded the hotfix for Wnd8.1, it worked for me!

    A lot of stress relieved, that was a few weeks this GC freeze issue was high in our list!

  29. NCrunch says:

    @Maoni: Is it possible to find out which versions of mscorlib.dll are affected by this problem?

    I find myself in the same situation as many here, with potentially thousands of my users impacted/frustrated by this problem.  I need to find a reliable way to detect it so that I can inform the user to update their system.

    From the comments above, it looks as though trying to determine the existence of the fix by checking for KB updates is not a reliable way to do this.  If we know which versions of mscorlib.dll are affected, we have a way to confront this and give our users a better experience.

  30. Lee Coward says:

    Hi folks,

    Wrapping this up into a single location for reference. Below is a list of the September Hotfix KBs which contain the GC fix. Because of the way Win10 updates are delivered, a direct download is not available. All of the other KBs include a download request link.

    support.microsoft.com/.../3093266 - Window 10

    support.microsoft.com/.../3088956 - Windows Server 2012 R2 and Windows 8.1

    support.microsoft.com/.../3088955 - Windows Server 2012 and Windows 8

    support.microsoft.com/.../3088957 - Windows 7 SP1, Windows Server 2008 SP2, Windows Server 2008 R2 SP1, and Windows Vista SP2

    After installing the hotfix, the mscorlib.dll version should be 4.6.106.0.

  31. Alexandre Mutel says:

    Finally! This hotfix seems to fix the Slow VS 2015 issue.

  32. Lee Coward says:

    Thanks for the confirmation @Alexandre!

  33. SilverFox says:

    Window 10(10565), found an new version of mscorlib.dll(4.6.1028), signed at 2015-09-14 11:57

  34. Jeff Arnett says:

    Great. I believe I was seeing this issue while using large solutions in Visual Studio 2015. When devenv.exe gets to ~2.2 GB in memory usage everything just freezes. Sometimes comes back in 5 mins, sometimes 30 minutes, sometimes never.

    Since at times I could see the for example "save" icon animating it doesn't look like the process crashes just "busy". Looking forward to trying it out.

  35. TBag says:

    blogs.msdn.com/.../announcing-net-framework-4-6-1-rc.aspx

    A strong candidate for an immediate fix for users on Windows 10/ other 4.6 installs. This solved what seems to be all of our woes regarding GC and stability/performance.

  36. Roman Brandstetter says:

    Will this soon be available for Windows Server? catalog.update.microsoft.com shows the fix only for Win 10

  37. onurg says:

    @Roman Brandstetter, it's already fixed in .NET 4.6.1

  38. adante says:

    Maoni and Lee (and any others I missed, probably a few unsung heroes out there) - thanks for staying in contact with us over the life of this issue. I know you guys have had some heat over this so just wanted to say your efforts in getting this resolved are wholeheartedly appreciated.

    1. Thank you for your kind words! We definitely appreciate it!

Skip to main content