TempDB Monitoring and Troubleshooting: DDL Bottleneck

This blog continues the discussion on the common issues in TempDB that you may need to troubleshoot. In previous blogs, I discussed how to identify and troubleshoot IO and allocation bottleneck in TempDB. In this blog, I will describe how to indentify DDL bottleneck in TempDB and to troubleshoot it.

It will be useful to understand why DDL bottleneck is most commonly related to TempDB and not in user databases. Well, if you think about user database(s), they are created as part of application design. Once the application is tested and deployed, the schema in user databases does not change except may be to add/drop indexes or when the application is upgraded to a newer version. TempDB on the other hand is quite different and is used as a scratch pad both by user applications to store intermediate data in # and ## tables as well as table variables and also by SQL Server as part of processing queries. Please refer to the https://blogs.msdn.com/sqlserverstorageengine/archive/2008/12/22/tempdb-basics-cont.aspx for some background. Now, let us take a simple case where an application creates a #table containing 10 columns in a stored procedure. When this stored procedure is invoked, the SQL Server creates this #table by inserting one row in system table that stores meta-information about tables and 10 rows, each representing 1 column the #table, in the system table storing meta-information about column. There may be other system tables affected, for example when you create index(s) on the #table, but it is clear that at minimum, it will require 11 rows to be inserted into system tables at create time and then these rows to be deleted when the stored procedure terminates. Now for moments assume that this table is not cached and this stored procedure is executed thousands of times by concurrent threads. It will cause a large number of inserts/deletes in system tables in TempDB. I hope that this provides you a good understanding why and how the DDL contention, i.e. blocking when accessing system tables, can happen inside TempDB. Note, it is possible that an application may create another ‘user’ database as a scratch pad. In that case, this database can potentially incur DDL bottleneck but then you may wonder why create a scratch database when TempDB provides optimized logging and easy manageability.

Now let us discuss how to detect DDL contention and what you can do to minimize it.

Diagnosing DDL Contention

As I indicated in the previous paragraph, the DDL contention refers to the contention in pages belonging to system tables. For this specific case, we need to look at PAGELATCH contention in system tables within TempDB. You can use the following query to identify it

select  session_id, wait_duration_ms, resource_description

from sys.dm_os_waiting_tasks

where wait_type like 'PAGE%LATCH_%' and

    resource_description like '2:%'

You may recall that this query is very similar, in fact it is identical, to the DMV query to detect allocation bottleneck. The difference however is that you will need to look at pages other than GAM, SGAM and PFS that are incurring PAGELATCH waits and see, using undocumented DBCC PAGE command, if these pages indeed belong to system table(s) in TempDB. This will show you that you are indeed incurring DDL bottleneck. Now for any performance metrics, you will need to compare this number with baseline when the workload was performing normally and see if you are incurring excessive DDL bottleneck.

You can also look at the following PERFMON counters to see if you are encountering significant change in these numbers. Note, internal objects in TempDB don’t cause DDL contention.

· Temp Tables Creation Rate The number of temporary table or variables created/sec.

· Temp Tables For Destruction The number of temporary tables or variables waiting to be destroyed by cleanup system thread.

Troubleshooting:

OK, now you have determined that you indeed are encountering DDL bottleneck, what can you do? Clearly, the goal is to minimize creation/destruction of user objects (i.e. #, ## and table variables). Unfortunately, there are no easy fixes. My suggestions are

(1) Check if the user objects are indeed getting cached. Please refer to https://blogs.msdn.com/sqlserverstorageengine/archive/2008/12/22/tempdb-basics-cont.aspx for details on detecting user object caching.

(2) Look at query plans for queries that are creating/destroying user objects in TempDB and see if you can do something to minimize it. For example, a #table may be created in a loop and maybe it is possible to create it outside the loop.

Thanks

Sunil Agarwal