TempDB Monitoring and Troubleshooting: IO Bottleneck

I hope my earlier blogs on TempDB (https://blogs.msdn.com/sqlserverstorageengine/archive/tags/TempDB/default.aspx)  have given you a good background on the objects contained in TempDB and its configuration and have deployed your workload in production and it runs for ever without any glitches. Oh, really? Well, don’t we all wish it was so but as you may have already experienced, perhaps one time too many, that the performance/configuration problems do happen. In fact, most DBAs/Administrators spend 20-30% of the time in troubleshooting the issues. With that context, let us look into key issues with TempDB in production and steps to troubleshoot them.

TempDB is more vulnerable because it is a shared resource and any misbehaving application or adhoc query can probably cause extreme resource issues in TempDB thereby affecting other applications running on the same instance. Here is a list of 4 common issues that you may encounter with TempDB

· IO bottleneck

· Allocation Contention

· DDL Contention

· Running out of space

I will cover IO bottleneck in this blog and others in my next blogs with the emphasis on how to detect that this indeed is happening and the steps that you can take to address.

IO Bottleneck: Now, the IO bottleneck is nothing specific to TempDB. Any database can incur IO bottleneck if the IO bandwidth is not configured to meet the workload demands. It is possible, and in fact likely, that your workload has changed overtime such the initial configuration is not able to handle it.

When a customer hits an IO bottleneck, it is tempting to conclude that the IO subsystem needs to be upgrades. While this may in fact be the final solution, but it should not be the first and the only step. Here are the recommended actions

1. Check if you have memory pressure: If you don’t have sufficient physical memory allocated to the SQL Server process, you may incur additional IO that could be avoided. As a first step, you need to make sure you don’t have memory bottleneck by looking at the following counters

a. Buffer Cache Hit ratio

b. Page Life Expectancy

c. Checkpoint pages/sec

d. Lazywrites/sec

I will also advise you to look at DBCC Memorystatus command that gives details on how the memory is being used. You may want to refer to https://support.microsoft.com/kb/907877

2. Identify queries that are generating the most IOs: you can run the following DMV query to identify it. The output of the query shows the top 10 queries (the SQL text and the query plan handle) that are generating the most IOs across all its executions. Agreed that the IOs include IOs in TempDB and also in other databases, but it is a good place to start. You can investigate these queries to identify any bad query plan (perhaps an index was not chosen) and/or possibly look into rewriting the queries that minimize IOs.

SELECT TOP 10

    (total_logical_reads/execution_count) AS

                                 avg_logical_reads,

    (total_logical_writes/execution_count) AS

                                 avg_logical_writes,

    (total_physical_reads/execution_count)

                        AS avg_phys_reads,

    execution_count,

    statement_start_offset as stmt_start_offset,

    (SELECT SUBSTRING(text, statement_start_offset/2 + 1,

        (CASE WHEN statement_end_offset = -1

            THEN LEN(CONVERT(nvarchar(MAX),text)) * 2

                ELSE statement_end_offset

            END - statement_start_offset)/2)

     FROM sys.dm_exec_sql_text(sql_handle)) AS query_text,

         plan_handle

FROM sys.dm_exec_query_stats

ORDER BY

  (total_logical_reads + total_logical_writes) DESC

3. Once you feel satisfied that you have configured your memory correctly and the query plans are ok, then it makes sense to add more IO bandwidth for TempDB. Now, here is one catch. Say, you add 1 new file to TempDB on a faster disk. Guess what happens? Due to proportional fill algorithm, the access to new file will be skewed in its favor and this may in fact lead to more IO and allocation bottleneck. The recommended action is to add the new file(s) with the same size as existing files and then restart the SQL Server. This will guarantee that the IOs are distributed across all files. Other alternative will be to move all the files in the TempDB to a different disk subsystem.

Thanks

Sunil Agarwal