Work continues around the working set issues. You can find the latest and greatest Information: http://support.microsoft.com/kb/918483
I spent several days last week investigating reported case trends and I will attempt to summarize the findings for you.
The following message was added to SQL Server 2005 SP2 to indicate that the working set (RAM resident portion of SQL Server) was paged out. This is a common indication of performance problems due to the paging.
A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 24484, committed (KB): 48036, memory utilization: 50%
There are several ways trimming can occur.
1. Signaled Trim – The low physical memory indicator event is set by the operating system (ref: QueryMemoryResourceNotification) when physical memory becomes low. The SQL Server Resource Monitor, Lazy Writer and other processing decommit portions of the BPool memory. This returns physical memory to the system and in turn lowers the Working Set of the SQL Server. When the event is signaled the sys.dm_os_ring_buffer, RING_BUFFER_RESOURCE_MONITOR records are logged. This is the normal working behavior of the SQL Server as it attempts to remain RAM resident. This activity does not trigger the error log message.
2. Self Trim – The operating system is handling a page fault for a newly allocated page and determines physical memory is low so the current working set should be trimmed (exchange a new page in RAM and move an older page to the page file). For example, if SQL Server is allocating a page for a stack, linked server, or other allocation and physical memory is in a low state the operating system may directly trim the working set of the SQL Server. On current operating system builds (Windows 2003 SP2, Windows XP SP2, …) the amount to trim can be ~1/4 the current working set size. On a 64GB system the SQL Server working set could page out 16GB in less than a second. Such a large trim does not occur on Windows Vista and we are looking at fixes for Windows 2003 and XP to avoid such dramatic trimming behavior.
3. Hard Trim – When the system gets into very low memory conditions a hard trim can occur. This activity can page out the entire working set for SQL Server and other processes on the box.
The signaled trim allows SQL Server to properly remove the oldest references from the BPool (LRU) and maintain optimal performance for the overall server load. The Self Trim and Hard Trim activities can trigger the SQL Server error log message indicating that a significant portion of the SQL Server memory has been placed in the page file and generally leads to performance problems.
There have been various blogs written about backing off the SQL Server max server memory setting or using locked pages. In testing I found that it was better to understand the root cause of the trimming issue than simply setting the options. For example, locked pages are only maintained for the BPool allocations. Thread stacks, linked server, XPROC and other non-BPool activity is still subject to trimming. A Hard Trim can page out stacks and other critical structures. The SQL Server working set trim will only be a few 100MB but all the critical performance structures have to be paged. At the same time the Hard Trim usually trims the working set of all other processes on the system so overall performance is affected.
I know of several bugs that can cause the behavior so here is the best set of troubleshooting steps at this time.
1. Make sure no other process are quickly eating up the memory. A key to this is watching the Memory:_Total Working Set Performance Counter. If you start an application that quickly uses memory all working sets can be trimmed quickly but the overall _Total remains steady. If the _Total drops significantly it is a good indication that it is not a single application quickly eating up memory but something like MiEmptyWorkingSet or MmAllocateContigiousMemory has taken place and the operating system has decided to engage in a hard trim. Our platforms support team can assist you in determining why the hard trim is being triggered.
2. Make sure the SQL Server process ID remains constant across the problem time. I found that if you stop the SQL Server service and restart it some of the SQL counters remain constant and can be misleading.
3. Make sure your system has the latest drivers. We have specifically encountered several drivers on X64 that trigger hard trim behavior.
4. Make sure the OS Terminal Service bug is not causing the problem. http://support.microsoft.com/default.aspx?scid=kb;EN-US;905865 When an admin TS’s into the server console, and later logs off, winlogon process might initiate trimming the processes running in Session 0. The post Windows 2003 Sp1 hot fix resolves this issue.
On the server you can match this to a winstations, SECURITY, event log entry for the System – Event ID 682 – Example of one shown below.
Session reconnected to winstation:
User Name: USERNAME
Logon ID: (0x0,0x12A8229)
Session Name: RDP-Tcp#3
Client Name: MYCLIENT
Client Address: 615.513.165.351
5. Capture a full set of performance counters as well as the sys.dm_os_ring_buffers from SQL Server. For example when an application on the computer starts using memory quickly the ring buffer entries clearly show the low physical memory indicator. When a self or hard trim occurs the low physical memory indicator is not set or only set for a short period and when compared to the performance counters the behavior pattern is clarified. The self and hard trim can require Microsoft Platforms support to help determine root cause.
6. Apply the OS fix. http://support.microsoft.com/default.aspx?scid=kb;EN-US;920739 The dirty page threshold is set at half the amount of physical memory. This has caused severe problems on systems with large amount of memory, when applications that use Buffered I/O, leading to trimming, and large amount of data being flushed to disk when the threshold value is hit. The registry key must be set as recommended by the article along with the hot fix.
7. Apply the OS fix. http://support.microsoft.com/default.aspx?scid=kb;EN-US;931308 You may experience increased paging to the hard disk when you run a program on a Windows Server 2003-based computer
8. Apply SQL Server 2005 SP2. SP2 has a few changes allowing it to react to the ‘Low Physical memory’ faster as well as increased sys.dm_os_ring_buffer information.
Bob Dorr – SQL Server Senior Escalation Engineer