Query Performance issues associated with a large sized security cache

In the past couple of months, SQL Server support team has come across some customers running into performance issues attributed to TokenAndPermUserStore in SQL Server 2005. This blog post attempts to compile all the information we have so far regarding this problem.

TokenAndPermUserStore is one of the many caches present in the SQL Server 2005 memory architecture. As the name implies, this cache stores various security related information used by the SQL Server Engine. If you are curious to know the different types of tokens currently cached in this cache on your server, the following query will provide you the information:

SELECT COUNT(*) as TokenCount, *

FROM

(SELECT

       x.value('(//@name)[1]', 'varchar (100)') AS [Token Name],

       x.value('(//@class)[1]', 'bigint') AS [Class],

       x.value('(//@subclass)[1]', 'int') AS [SubClass]

FROM

       (SELECT CAST (entry_data as xml)

       FROM sys.dm_os_memory_cache_entries

       WHERE type = 'USERSTORE_TOKENPERM')

              AS R(x)

       ) a

GROUP BY [Token Name],[Class],[SubClass]

NOTE: Please do not run this command on a production server during peak load as it could take a while to finish depending upon the number of entries present in the cache.

The problems we have seen are specific to token type TokenAccessResult with a class of 65535. These tokens represent information about cumulative permission checks for queries. For more information on this, refer to the KB article: Queries take a longer time to finish running when the size of the TokenAndPermUserStore cache grows in SQL Server 2005.

There are several indicators you can monitor to determine if you are running into this class of problems.

1. The amount of memory used by this security token cache

2. The number of entries present in this security token cache

3. The extent of contention on this security token cache

To find out the amount of memory consumed by this token cache, you can query the DMV’s as follows:

SELECT SUM(single_pages_kb + multi_pages_kb) AS "SecurityTokenCacheSize(kb)"

FROM sys.dm_os_memory_clerks

WHERE name = 'TokenAndPermUserStore'

There is no specific threshold for this size beyond which the problem starts to happen. The characteristic you need to monitor is the rate at which this cache size is growing. If you are encountering problems with this cache, then you will notice that as the size of the cache grows, the nature of the problems you experience becomes worse. On a sample server that experienced this problem, the cache grew at a rate approximately 1MB per min to reach close to 1.2 GB. We have seen the problem starting to show up even when the size of this cache reaches several hundred MB.

Next, to understand if there is contention while accessing this cache, you have to execute commands like the following:

Servers running builds < SQL 2005 SP2

SET NOCOUNT ON

CREATE TABLE #spins([Spinlock Name] varchar(50),Collisions numeric,Spins numeric,[Spins/Collision] float)

INSERT INTO #spins EXECUTE ('DBCC SQLPERF (''SPINLOCKSTATS'')')

SELECT TOP 20 * FROM #spins ORDER BY Collisions DESC

DROP TABLE #spins

Servers running builds >= SQL 2005 SP2

SET NOCOUNT ON

CREATE TABLE #spins([Spinlock Name] varchar(50),Collisions numeric,Spins numeric,[Spins/Collision] float,[Sleep Time (ms)] numeric,Backoffs numeric)

INSERT INTO #spins EXECUTE ('DBCC SQLPERF (''SPINLOCKSTATS'')')

SELECT TOP 20 * FROM #spins ORDER BY Collisions DESC

DROP TABLE #spins

In this output, the row you need to pay attention to is the following one:

Spinlock Name

Collisions

Spins

Spins/Collision

Sleep Time (ms)

Backoffs

MUTEX

7,714,984

1,020,994,305,589

132,339

55,000,343

101,024,820

MUTEX

8,143,896

1,055,469,534,588

129,603

61,055,906

104,387,627

This sample output was captured across 30 minute duration on a server experiencing this problem. Basically this output shows that various threads inside the SQL Server process is contending for a spinlock named MUTEX. The values above are various attributes that indicate the extent and nature of contention. Spinlock is a very lightweight synchronization mechanism used with the SQL Server engine. Depending upon the data structure that a particular spinlock protects, it is given a unique name within the SQL Engine. MUTEX is the name of the spinlock which protects the security token cache (among a few other things) that we are discussing here. Very similar to the size, what you want to monitor is the rate at which the various values here increase.

The symptoms that you want to co