Trouble Shooting Query Performance Issues Related to Plan Cache in SQL 2005 RTM and SP1

 

If after upgrading from SQL 2000 to SQL 2005 RTM and/or SP1, you notice decreased OLTP database application throughput, bloated plan cache or out of memory errors, you will need to gather machine configuration information and system performance data in order to perform a detailed analysis. Below, we will outline the steps to take to gather relevant data in order to determine if the problem is plan cache related.

 

1.0 Machine Configuration Information That Can Impact Plan Cache Size/Performance

 

Machine configuration and server settings can impact plan cache performance. This section covers a check list of these settings/options that need to be examined as a first step to diagnosing a performance regression.

 

1.1 Understanding the Machine Configuration:

 

  1. Processor:

 

(i) Processor Architecture: Recommendations for server settings for optimal plan cache behavior varies with the platform (x86/IA64/x64). The machine platform can be found executing 'set PROCESSOR_ARCHITECTURE' at the command prompt. Also check the platform of SQL Server to see if we are running under WOW. Note that on 64 bit this depends on whether you are running %systemroot%\system32\cmd.exe or %systemroot%\SysWOW64\cmd.exe. It will show x86 for the WOW version of the command console.

 

(ii) Total number of processors: The DMV sys.dm_os_sys_info has this information:

select cpu_count from sys.dm_os_sys_info

go

 

(iii) NUMA Architecture and Number of Nodes: The number of memory nodes on a machine can be found using:

select distinct top 1 instance_name + 1 as number_of_nodes

from sys.dm_os_performance_counters

where object_name = 'SQLServer:Buffer Node'

go

 

Also check the errorlog for the entries as below for the number of CPU nodes:

 

TimeStamp Server Multinode configuration: node 0: CPU mask: 0x00000000000000f0 Active CPU mask: 0x00000000000000f0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.

 

This should give the number of CPU nodes as well as the CPU Mask set.

 

(iv) If NUMA Architecture, Number of processors per Node: If Soft NUMA configurations have not been used then the following query gives a mapping of CPU to nodes:

select parent_node_id, scheduler_id from sys.dm_os_schedulers

where scheduler_id < 255

order by scheduler_id desc

go

 

If soft NUMA has been used, the number of CPU nodes is greater than the number of hardware NUMA nodes. The above query then maps the CPUs to CPU nodes, not hardware NUMA nodes. To check if the machine has been configured with soft NUMA search the registry for keys under:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\90\NodeConfiguration\. If there are keys present here they define the soft NUMA mapping of processors to soft NUMA nodes. For more information see BOL article ‘How to: Configure SQL Server to Use Soft-NUMA’.

 

  1. Memory:

 

(i) Total Physical Memory:

select physical_memory_in_bytes/(1024.0*1024.0*1024.0) as physical_mem_in_gb

from sys.dm_os_sys_info

go

 

(ii) If NUMA Architecture, Memory allocated per Node (in 8K pages):

select cntr_value

from sys.dm_os_performance_counters

where object_name = 'SQLServer:Buffer Node' and counter_name = 'Total pages'                                                     

go

 

The plan cache is logically on node 0 on a multi node machine. This however does not imply that all the memory for the plan cache comes from node 0 only. To better explain this, consider the following example: A multi-node machine has 2 NUMA nodes, the buffer pool has 2 memory nodes, nodes 0 and 1 respectively. Schedulers s1, s2, s3 are associated with NUMA node 0 and s4, s5, s6 with NUMA node 1. The plan cache/memory clerk is on memory node 0. When scheduler s4 requests 2 pages for a new PMO (plan), the buffer pool tries to allocate the pages locally through the NUMA node these schedulers are associated with i.e. through node 1. Even though these pages were allocated from NUMA node 1, the cachestore memory clerk which is logically on node 0, will show a total of 2 pages. Therefore allocating more memory to NUMA node 0 just because the plan cache is on node 0 will not help performance. It may actually land up negatively impacting performance due to large number of remote memory accesses if queries are affinitized uniformly among nodes. If the memory allocated per node is not uniform in general it is better to affinitize the workload to nodes that have sufficient memory. Ideally queries should be affinitized to nodes through specific ports (if possible). If SQL Server has been configured to run on a subset of the available NUMA nodes, the buffer pool will not automatically be limited to the memory on those nodes. In this case, use the max server memory option to limit the buffer pool.

 

(iii) Lock Pages In Memory Privilege: This is an operating system privilege that allows locking of physical memory preventing OS paging of the locked memory. This privilege needs to be granted to the service account of sql server. On 32 bit architecture systems with sql server standard, enterprise and developer editions, this privilege needs to be granted to the sql server service account in order to use the AWE mechanism. On 64 bit architecture systems with any sql server edition, granting this privilege to the sql server service account is recommended. Note that a system restart is required after granting this privilege local system accounts for the permission to take effect. For domain users a fresh login is required. A quick and easy way to check (though non conclusive) if this privilege has been granted correctly on 32 bit would be to try to turn on AWE enabled as follows:

 

sp_configure 'show advanced options', 1

go

reconfigure with override

go

 

sp_configure 'awe enabled', 1

go

reconfigure with override

go

 

If lock pages in memory privilege has not been granted to the service account, sp_configure will fail with the following error message:

 

Msg 5845, Level 16, State 1, Line 1

Address Windowing Extensions (AWE) requires the 'lock pages in memory' privilege which is not currently present in the access token of the process.

 

On 64 bit machines, execute 'dbcc memorystatus'. If the AWE Allocated memory is 0 then lock pages in memory privilege has not been granted to the service account or it has not taken effect. On 32 bit machines, if AWE Allocated is 0 in the data from dbcc memory status and the sp_configure option 'awe enabled' is set to 1, then lock pages in memory privilege has not been granted to the service account or it has not taken effect.

 

For more information on how to grant this privilege the sql server service account please refer to the MSDN article ‘How to: Enable the Lock Pages in Memory Option (Windows)’. Restarting the machine typically helps permissions granted to the sql server service account take effect.

 

(iv) /3GB Boot Parameter: This is an operating system boot parameter that enlarges the user-mode virtual address space to 3 GB on a 32 bit system with less than 16GB total physical memory. On a 32 bit system with greater than 16GB total physical memory adding this parameter limits the physical memory to 16GB. A 32 bit process running under WOW64 will have 4GB of user mode virtual address space.

 

(v) /UserVA Boot Parameter: Another switch with similar function as the /3GB switch is /UserVA=X where X is number in MB.

 

The following query will tell you if either the /3GB or the /UserVA boot parameter has been enabled:

 

select

     case

           when virtual_memory_in_bytes / 1024 / (2048*1024)

< 1 then 'No switch'

           else '/3GB'

     end

from sys.dm_os_sys_info

 

1.2 SQL Server Configuration Options

 

  1. Max Worker Threads: The number of workers can be found using the sp_configure option:

 

sp_configure 'show advanced options', 1

go

reconfigure with override

go

 

sp_configure 'max worker threads'

go

 

Each worker has a stack associated with it. The stack size can be found:

 

select stack_size_in_bytes/(1024.0 * 1024.0) as stack_size_in_MB from sys.dm_os_sys_info

go

 

The memory allocated for worker stacks needs to be factored out of the total physical memory before considering what the sql server max server memory is set. A large number of worker threads can actually reduce the memory available for plan cache growth.

 

  1. Max and Min Server Memory: Max server memory can be set less than or equal to total physical memory minus 2 or 3 GB for the OS/kernel processes minus stack space required for workers. Memory is used for cache pages as well as database pages. Setting max server memory fairly low can have a negative impact on performance since it will mean frequent flushing of database pages to disk under memory pressure. If SQL Server has been configured to run on a subset of the available NUMA nodes, the buffer pool will not automatically be limited to the memory on those nodes. In this case, use the max server memory option to limit the buffer pool. Typically min server memory is set less than or equal to max server memory. In the case of a dedicated server max server memory can be set equal to min server memory. Note that memory is allocated on demand only. Once allocated though, it will not shrink beyond min server memory.

 

  1. Awe Enabled: The awe enabled can be found using the sp_configure option:

 

sp_configure 'show advanced options', 1

go

reconfigure with override

go

 

sp_configure 'awe enabled'

go

 

On 32 bit systems it is recommended to set AWE Enabled to 1. This allows sql server to go beyond the process virtual address space limit on 32 bit platforms. It will also allow buffer pool to shrink virtual address space (VAS) usage under VAS pressure that may occur due to lots of multi page allocations or CLR. On 64 bit systems, this setting has no effect.

 

  1. CPU Affinity Mask:

 

It is recommended that the CPU affinity mask be set such that all processors are used uniformly. Note though that by setting this hard affinity, threads cannot move around. In the case of a NUMA machine, if one node has less memory than other nodes, then it is better to affinitize to other nodes (CPU's). The affinity mask can be found using the sp_configure option:

 

sp_configure 'show advanced options', 1

go

reconfigure with override

go

 

sp_configure 'affinity mask'

go

 

  1. Startup Parameter -g: A command line start up parameter that can be used as follows:

 

sqlservr [-g memory_to_reserve]

 

Using this start up parameter, an integer number of megabytes (MB) of memory can be specified that SQL Server will leave available for memory allocations within the SQL Server process, but outside the SQL Server buffer pool. The memory outside of the memory pool is the area used by SQL Server for multipage allocations among other things like loading DLLs. The default is 256 MB on 32 bit systems and is reserved even if this start up option is not specified. Using a value lower than the default increases the amount of memory available to the buffer pool and thread stacks; this may, in turn, provide some performance benefit to memory-intensive workloads in systems that do not use many extended stored procedures, distributed queries, or automation objects or any SQLCLR. For applications that use SQLCLR heavily, limiting this memory may result in memory starvation. For such applications it is the memory_to_reserve is set beyond the default of 256 MB if a lot of memory is allocated in CLR. Specifying a higher than default value may prove beneficial to workloads that have many multipage allocations. Setting this start up parameter appropriately and tuning it based on the nature of the workload running on the server is recommended for optimal results.

 

For more information refer to the books online article at ‘Using the SQL Server Service Startup Options

 

  1. CLR Enabled: Enabling CLR causes the buffer pool to shrink in some cases, in turn reducing the memory available for plan cache growth. Memory for loading assemblies, and for the garbage collector heap and JIT heap comes from outside the buffer pool. If the size of this memory allocated is high, then it can cause virtual memory pressure causing the buffer pool to shrink. On WOW64 we block the upper 1 GB from CLR if enabled (and everybody else) leaving 3 GB for the virtual address space. CLR can be enabled using the sp_configure option:

 

sp_configure 'show advanced options', 1

go

reconfigure with override

go

 

sp_configure 'clr enabled'

 

  1. Max Degree of Parallelism: A high setting for max degree of parallelism can reduce the amount of memory available for plan cache growth. This is because when the query runs a significant portion of the memory from the buffer pool is consumed for query execution leaving less room for the plan cache to grow. Max Degree of Parallelism set can be queried as follows:

 

sp_configure 'show advanced options', 1

go

reconfigure with override

go

 

sp_configure 'max degree of parallelism'

go

 

1.3 Database Options

 

  1. Database Parameterization - Simple or Forced. The database parameterization option can have a significant impact on plan cache performance. It can be queried as follows:

 

select name, is_parameterization_forced from sys.databases

go

 

1.4 Workload Characteristics

 

  1. Nature of the workload

 

(i) Query Mix: The type of the workload whether purely adhoc or stored procedures or a mix of adhoc and stored procedures can analyzed using:

select usecounts, cacheobjtype, objtype, bucketid

from sys.dm_exec_cached_plans cp

where cacheobjtype = 'Compiled Plan'

order by objtype

go

 

The objtype of 'adhoc' or 'proc' indicates the number of adhoc queries and stored procedure executions. Note that this query gives a snapshot of the plan cache at the time the query is executed. Prior to execution of this query, entries may have been removed from plan cache due to memory pressure.

 

(ii) Parameterization of Queries: From the above query if rows with objtype 'prepared' indicate that the queries are getting parameterized. If there are a large number of queries differing only in parameter values, the parameterized plan will have a high value for the usecounts column. If there are no parameterized queries, but many rows with objtype 'adhoc' then it indicates that queries are not getting parameterized.

 

(iii) Degree of Reuse of Plans: The usecounts column in the above query will indicate the degree of reuse of plans. If there are adhoc queries (shell queries) with a large usecounts this indicates that the exact same adhoc query is submitted several times for execution. If on the other hand the usecounts is close to 1 for adhoc queries this indicates a workload that has adhoc queries with almost no reuse. If the usecounts of the parameterized queries is low this may indicate incorrect client side parameterization. Several parameterized queries same query, but difference in data type of variables may indicate buckets of parameterized queries.

 

It is useful to gather all of the above information on SQL Server 2005 and SQL Server 2000 if you are seeing a performance regression upon upgrading to SQL Server 2005 (Note though that the scripts for SQL Server 2000 are different).

 

  1. Number of Connections to SQL Server: The number of connections to sql server executing this workload.

 

  1. CPU Utilization: The CPU utilization over the duration of the test workload can be monitored by capturing '%Processor Time\_Total' counter through logman. In the case of a multinode NUMA machine, it is important to capture CPU utilization per processor especially if a uniform affinity has not been set.

 

  1. Throughput: Throughput can be measured as using the 'SQLServer:SQL Statistics\Batch Requests/sec' performance counter over the duration of the test workload. It is important to watch for fluctuations/degradations in throughput.

     

  2. Plan Cache size and Database pages size: Collect 'SQL Server:Plan Cache\Cache Pages(_Total)' and 'SQLServer:BufferManager\Database pages' over the duration of the workload. This data will point out performance issues due to aggressive plan cache growth at the cost of keeping database pages in memory. It may also indicate that the max server memory setting may need to be increased to keep a more database pages in memory. It is useful to monitor these counters for both SQL Server 2005 and SQL Server 2000. This will help us understand if we are caching queries in SQL Server 2005 that we were not caching in SQL Server 2000, and what if any is the impact on throughput due to this additional caching.