An in-depth look at SQL Server Memory–Part 3

In part 1 and part 2 of the series, we talked about the memory architecture and the Procedure Cache respectively. In this third and final instalment of the SQL Server Memory series, I will look to focus on troubleshooting SQL Server Memory pressure issues.

 

Before we start on the troubleshooting part though, we need to determine the type of memory pressure that we're seeing here. I've tried to list those down here:

1. External Physical Memory pressure - Overall RAM pressure on the server. We need to find the largest consumers of memory (might be SQL), and try to reduce their consumption. It might also be that the system is provided with RAM inadequate for the workload it's running.

2. Internal Physical Memory pressure - Memory Pressure on specific components of SQL Server. Can be a result of External Physical Memory pressure, or of one of the components hogging too much memory.

3. Internal Virtual Memory pressure - VAS pressure on SQL server. Mostly seen only on 32 bit (X86) systems these days (X64 has 8 TB of VAS, whereas X86 only had 4 GB. Refer to Part 1 for details).

4. External Virtual Memory pressure - Page file pressure on the OS. SQL Server does not recognize or respond to this kind of pressure.

 

Troubleshooting

Now for getting our hands dirty. When you suspect memory pressure on a server, I would recommend checking the following things, in order:

 

1. Log in to the server, and take a look at the performance tab of the Task Manager. Do you see the overall memory usage on the server getting perilously close to the total RAM installed on the box? If so, it's probable that we're seeing External Physical Memory pressure.

2. Next, look at the Processes tab, and see which of the processes is using the maximum amount of RAM. Again, for SQL, the true usage might not reflect in the Working set if LPIM is enabled (i.e. SQL is using AWE API's to allocate memory). To check SQL's total memory consumption, you can run the following query from inside SQL (valid from SQL 2008 onwards):

select physical_memory_in_use_kb/(1024) as sql_physical_mem_in_use_mb,

locked_page_allocations_kb/(1024) as awe_memory_mb,

total_virtual_address_space_kb/(1024) as max_vas_mb,

virtual_address_space_committed_kb/(1024) as sql_committed_mb,

memory_utilization_percentage as working_set_percentage,

virtual_address_space_available_kb/(1024) as vas_available_mb,

process_physical_memory_low as is_there_external_pressure,

process_virtual_memory_low as is_there_vas_pressure

from sys.dm_os_process_memory

Go

For SQL installations prior to 2008 (valid for 2008 and 2008 R2 as well), you can run DBCC Memorystatus, and take the total of VM Committed and AWE Allocated from the memory manager section to get a rough idea of the amount of memory being used by SQL Server.
 

3. Next, compare this with the total amount of RAM installed on the server. If SQL seems to be taking most of the memory, or at least, much more than it should, then we need to focus our attentions on SQL Server. The exact specifics will vary according to the environment, and factors such as whether it is a dedicated SQL server box, number of instances of SQL Server running on the server, etc. In case you have multiple instances of SQL Server, it will be best to start with the instance consuming the maximum amount of memory (or the maximum deviation from "what it should be consuming"), tune it and then move on to the next one.
 

4. One of the first things to check should be the value of the "max server memory" setting for SQL Server. You can check this by turning on the 'show advanced options' setting of sp_configure, or by right clicking on the instance in Object Explorer in SSMS, selecting properties, and navigating to the "memory" tab. If the value is "2147483647", this means that the setting has been left to default, and has not been set since the instance was installed. It's absolutely vital to set the max server memory setting to an optimal value. A general rule of thumb that you can use to set a starting value is as follows:
Total server memory - (Memory for other applications/instances+ OS memory)
The recommendation for the OS memory value is around 3-4 GB on 64 bit systems, and 1-2 GB on 32 bit systems. Please note that this is only a recommendation for the starting value. You need to fine tune it based on observations w.r.t performance of both SQL and other applications (if any) on the server.
 

5. Once you've determined that the max server memory is set properly, the next step is to find out which component within SQL is consuming the most memory. The best place to start is, quite obviously, the good old "DBCC Memorystatus" command, unless you're using NUMA, in which case, it will be best to use perfmon counters to track page allocations across NUMA nodes, as outlined here.
I will try to break down most of the major components in the DBCC Memorystatus output here (I would recommend reading KB 907877 as a primer before this):

                   I. First up is the memory manager section. As discussed earlier, this section contains details about the overall memory comsumption of SQL Server. An example:

Memory Manager KB

---------------------------------------- -----------

VM Reserved 4059416

VM Committed 43040

Locked Pages Allocated 41600

Reserved Memory 1024

Reserved Memory In Use 0

 

                  II. Next, we have the memory nodes, starting with 0. As I mentioned, because there is a known issue with the way dbcc memorystatus displays the distribution of allocations across memory nodes, it is best to study the distribution through the SQL Server performance counters. Here's a sample query:
 

select * from sys.dm_os_performance_counters

      where object_name like '%Buffer Node%'
 

                III. Next, we have the clerks. I've tried to outline the not so obvious ones in this table, along with their uses:        

 

Clerk Name

Used for

MEMORYCLERK_SQLUTILITIES

Database mirroring, backups, etc.

MEMORYCLERK_SQLXP

Extended Stored Procedures (loaded into SQL Server)

MEMORYCLERK_XE, MEMORYCLERK_XE_BUFFER

Extended Events

If you see any of the clerks hogging memory, then you need to focus on that, and try and narrow down the possible causes.

 

Another thing to watch out for is high values for the multipage allocator. If you see any clerk with extremely high values for multipage allocator, it means that the non-Bpool area is growing due to one of the following:

                                       i. CLR Code: Check the errorlog for appdomain messages

                                     ii. COM Objects : Check the errorlog for sp_oacreate

                                    iii. Linked servers: Can be checked using Object Explorer in SSMS

                                   iv. Extended stored procedures : Check the errorlog for loading extended stored procedure messages.

                                    Alternatively, you can query the sys.extended_procedures view as well.

                                     v. Third party DLL's : Third party DLL's loaded into the SQL server process space. Run the following query to check:
select * from sys.dm_os_loaded_modules where company <> 'Microsoft Corporation'
 

Here's a query to check for the biggest multipage consumers:

select type, name, sum(multi_pages_kb)/1024 as multi_pages_mb

from sys.dm_os_memory_clerks

where multi_pages_kb > 0

group by type, name

order by multi_pages_mb desc

 

Yet another symptom to watch out for is a high ratio of stolen pages from the Buffer Pool. You can check this in the 'Buffer Pool' section of the MEMORYSTATUS output. A sample:

Buffer Pool Value

---------------------------------------- -----------

Committed 4448

Target 25600

Database 2075

Dirty 50

In IO 0

Latched 0

Free 791

Stolen 1582

Reserved 0

Visible 25600

Stolen Potential 22738

Limiting Factor 17

Last OOM Factor 0

Last OS Error 0

Page Life Expectancy 87529

What this means is that Buffer Pool pages are being utilized for "other" uses, and not for holding data and index pages in the BPool. This can lead to performance issues and a crunch on the Bpool, thereby slowing down overall query performance (please refer to part 1 for consumers that "Steal" pages from the BPool). You can use the following query to check for the highest "Steal" consumers:

select type, name, sum((single_pages_kb*1024)/8192) as stolen_pages

from sys.dm_os_memory_clerks

where single_pages_kb > 0

group by type, name

order by stolen_pages desc

 

               IV.   Next, we have the stores namely, Cachestore, Userstore and Objectstore. Please refer to part 1 for how and by which component these clerks are used. You can use the following queries to check for the biggest Cachestores, Userstores and Objectstores respectively:
 

select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024

as store_size_mb

from sys.dm_os_memory_cache_counters

where type like 'CACHESTORE%'

group by name, type

order by store_size_mb desc

go

 

select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024

as store_size_mb

from sys.dm_os_memory_cache_counters

where type like 'USERSTORE%'

group by name, type

order by store_size_mb desc

go

 

select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024

as store_size_mb

from sys.dm_os_memory_clerks

where type like 'OBJECTSTORE%'

group by name, type

order by store_size_mb desc

go

 

                 V.   Next, we have the gateways. The concept of gateways was introduced to throttle the use of query compilation memory. In plain english, this means that we did not want to allow too many queries with a high requirement for compilation memory to be running at the same time, as this would lead to consequences like internal memory pressure (i.e. one of the components of the buffer pool growing and creating pressure on other components).

The concept basically works like this: When a query starts execution, it will start with a small amount of memory. As its consumption grows, it will cross the threshold for the small gateway, and must wait to acquire it. The gateway is basically implemented through a semaphore, which means that it will allow upto a certain number of threads to acquire it, and make threads beyond the limit wait. As the memory consumption for the query grows, it must acquire the medium and big gateways before being allowed to continue execution. The exact thresholds depend on factors like total memory on the server, SQL Max server memory sitting, memory architecture (x86 or x64), load on the server, etc.

The number of queries allowed at each of the gateways described in the following table:             

    

Gateway

Dynamic/Static

Config Value

Small

Dynamic

Default is (no. of CPU's SQL sees * 4)

Medium

Static

Number of CPU's SQL sees.

Large

Static

1 per instance

 

So if you see a large number of queries waiting on the large gateway, it means that you need to see why there are so many queries requiring large amounts of memory, and try to tune those queries. Such queries will show up with RESOURCE_SEMAPHORE_QUERY_COMPILE or RESOURCE_SEMAPHORE wait types in sysprocesses, sys.dm_exec_requests, etc.

 

I am listing down some DMV's that might come in handy for SQL Server Memory Troubleshooting:

Sysprocesses

Sys.dm_exec_requests

Sys.dm_os_process_memory: Usage above.

Sys.dm_os_sys_memory: Will give you the overall memory picture for the server

Sys.dm_os_sys_info: Can be used to check OS level information like hyperthread ratio, CPU Ticks, OS Quantum, etc.

Sys.dm_os_virtual_address_dump: Used to check for VAS usage (reservations). The following query will give you VAS usage in descending order of reservations:

 

with vasummary(Size,reserved,free) as (select size = vadump.size,

reserved = SUM(case(convert(int, vadump.base) ^ 0) when 0 then 0 else 1 end),

free = SUM(case(convert(int, vadump.base) ^ 0x0) when 0 then 1 else 0 end)

from

(select CONVERT(varbinary, sum(region_size_in_bytes)) as size,

region_allocation_base_address as base

from sys.dm_os_virtual_address_dump

where region_allocation_base_address<> 0x0

group by region_allocation_base_address

UNION(

select CONVERT(varbinary, region_size_in_bytes),

region_allocation_base_address

from sys.dm_os_virtual_address_dump

where region_allocation_base_address = 0x0)

)

as vadump

group by size)

select * from vasummary order by reserved desc

go

 

Sys.dm_os_memory_clerks (Usage above)

Sys.dm_os_memory_nodes: Just a select * would suffice. This DMV has one row for each memory node.

Sys.dm_os_memory_cache_counters: Used above to find the size of the cachestores. Another sample query would be

select (single_pages_kb+multi_pages_kb) as memusage,* from Sys.dm_os_memory_cache_counters order by memusage desc

 

Once you have narrowed down the primary consumer and the specific component which is causing a memory bottleneck, the resolution steps should be fairly simple. For example, if you see some poorly written code, you can hound the developers to tune it. For other processes hogging memory at the OS Level, you will need to investigate them. For high consumption by a particular clerk, check the corresponding components. An example would be, say, in case of high usage by the SQLUtilities clerk, one of the first things you need to check if there is any Mirroring set up on the instance, and if it’s working properly.

Another thing I would strongly recommend would be to watch out for memory related KB articles, and make sure you have the relevant fixes applied.

Hope this helps. Any feedback, questions or comments are welcome.