Excessive cached read I/O is a growing problem. For over one year we have been working on this problem with several companies. You can read more about it in the original blog post:
On 32 bit systems, the kernel could address at most 2GB of virtual memory. This address range is shared and divided up for the many resources that the system needs; one of which is the System File Cache’s working set. On 32 bit systems the theoretical limit is almost 1GB for the cache’s working set; however, when a page is removed from the working set it will end up on the standby page list. Therefore the system can cache more than the 1 GB limit if there is available memory. The working set; however, is just limited to what can be allocated within the Kernel’s 2GB virtual address range. Since most modern systems have more than 1 GB of physical RAM, the System File Cache’s working set’s size on a 32 bit system typically isn’t a problem.
With 64 bit systems, the kernel virtual address space is very large and is typically larger than physical RAM on most systems. On these systems the System File Cache’s working set can be very large and is typically about equal to the size of physical RAM. If applications or file sharing performs a lot of sustained cached read I/O, the System File Cache’s working set can grow to take over all of physical RAM. If this happens, then process working sets are paged out and everyone starts fighting for physical pages and performance suffers.
The only way to mitigate this problem is to use the provided APIs of GetSystemFileCacheSize() and SetSystemFileCacheSize(). The previous blog post "Too Much Cache" contains sample code and a compiled utility that can be used to manually set the System File Cache’s working set size.
The provided APIs, while offering one mitigation strategy, has a couple of limitations:
1) There is no conflict resolution between multiple applications. If you have two applications trying to set the System File Cache’s working set size, the last one to call SetSystemFileCacheSize() will win. There is no centralized control of the System File Cache’s working set size.
2) There is no guidance on what to set the System File Cache’s working set size to. There is no one size fits all solution. A high cache working set size is good for file servers, but bad for large memory application and a low working set size could hurt everyone’s I/O performance. It is essentially up to 3rd party developers or IT administrators to determine what is best for their server and often times, the limits are determined by a best guesstimate backed by some testing.
We fully understand that while we provide one way to mitigate this problem, the solution is not ideal. We spent a considerable amount of time reviewing and testing other options. The problem is that there are so many varied scenarios on how users and applications rely on the System File Cache. Some strategies worked well for the majority of usage scenarios, but ended up negatively impacting others. We could not release any code change that would knowingly hurt several applications.
We also investigated changing some memory manager architecture and algorithms to address these issues with a more elegant solution; however the necessary code changes are too extensive. We are experimenting with these changes in Windows 7 and there is no way that we could back port them to the current operating systems. If we did, we would be changing the underlying infrastructure that everyone has been accustomed to. Such a change would require stress tests of all applications that run on Windows. The test matrix and the chance of regression are far too large.
So that brings us back to the only provided solution – use the provided APIs. While this isn’t an ideal solution, it does work, but with the limitations mentioned above. In order to help address these limitations, I’ve updated the SetCache utility to the Microsoft Windows Dynamic Cache Service. While this service does not completely address the limitations above, it does provide some additional relief.
The Microsoft Windows Dynamic Cache Service uses the provided APIs and centralizes the management of the System File Cache’s working set size. With this service, you can define a list of processes that you want to prioritize over the System File Cache by monitoring the working set sizes of your defined processes and back off the System File Cache’s working set size accordingly. It is always running in the background monitoring and dynamically adjusting the System File Cache’s working set size. The service provides you with many options such as adding additional slack space for each process’ working set or to back off during a low memory event.
Please note that this service is experimental and includes sample source code and a compiled binary. Anyone is free to re-use this code in their own solution. Please note that you may experience some performance side effects while using this service as it cannot possibly address all usage scenarios. There may be some edge usage scenarios that are negatively impacted. The service only attempts to improve the situation given the current limitations. Please report any bugs or observations here to this blog post. While we may not be able to fix every usage problem, we will try to offer a best effort support.
Side Effects may include:
Cache page churn – If the System File Cache’s working set is too low and there is sustained cached read I/O, the memory manager may not be able to properly age pages. When forced to remove some pages in order to make room for new cache pages, the memory manager may inadvertently remove the wrong pages. This could result in cached page churn and decreased disk performance for all applications.
Version 1.0.0 – Initial Release
NOTE: The memory management algorithms in Windows 7 and Windows Server 2008 R2 operating systems were updated to address many file caching problems found in previous versions of Windows. There are only certain unique situations when you need to implement the Dynamic Cache service on computers that are running Windows 7 or Windows Server 2008 R2. For more information on how to determine if you are experiencing this issue and how to resolve it, please see the More Information section of Microsoft Knowledge Base article 976618 – You experience performance issues in applications and services when the system file cache consumes most of the physical RAM.