Troubleshooting Slow Disk I/O in SQL Server

Article
07/08/2011

If you suspect you are experiencing poor disk performance you can use internal DMVs combined with a Performance Monitor collection to get a good picture of the health of the disk I/O subsystem and any latency SQL Server is experiencing from its poor performance.

Performance Monitor Counters:

Below is a list of counters and thresholds to watch for when monitoring IO on your SQL Server:

Edit 7/19/2011 - Please monitor the below counters and compare average values over time to the thresholds provided. The counters are intended for OLTP workloads and larger workloads such as OLAP and batch activity may not fit within these thresholds.

Object	Counter	Instance	Threshold	Comments
Physical Disk / Logical Disk	Avg. Disk sec/Read	*	( < .005 excellent; .005 - .010 Good; .010 - .015 Fair; > .015 investigate)	Measures read latency on the disks
Physical Disk / Logical Disk	Avg. Disk sec/Write	*	( < .005 excellent; .005 - .010 Good; .010 - .015 Fair; > .015 investigate)	Measures write latency on the disks
Physical Disk / Logical Disk	Avg. Disk sec/Transfer	*	( < .005 excellent; .005 - .010 Good; .010 - .015 Fair; > .015 investigate)	Measures average latency for read or write operations
Physical Disk / Logical Disk	% Idle Time	*	> 50%	The disk should not be working constantly
Physical Disk / Logical Disk	Avg. Disk Queue Length	*	None	Be careful when using old thresholds of 2 with SQL Server and SANs. This value can be much higher than 2. Do not rely solely on this counter to diagnose an IO problem.
Physical Disk / Logical Disk	Disk reads/sec	*	None	Number of read IOPs
Physical Disk / Logical Disk	Disk writes/sec	*	None	Number of write IOPs
Physical Disk / Logical Disk	Disk Transfers/sec	*	None	Number of read and write IOPs (used to compare against capacity of storage subsystem)
Physical Disk / Logical Disk	Disk Read Bytes/sec	*	None	Used for determine bandwidth utilization for Read Ops
Physical Disk / Logical Disk	Disk Write Bytes/sec	*	None	Used to determine bandwidth utilization for Write Ops
Physical Disk / Logical Disk	Disk Bytes/sec	*	None	Used to determine bandwidth total bandwidth utilization
Processor	% Processor Time	_Total	< 80%	Amount of total CPU usage across all processors
Processor	% User Time	_Total	< 80%	Amount of total CPU usage in user mode across all processors
Processor	% Privileged Time	_Total	< 30%	Amount of total CPU usage in kernel mode across all processors
Memory	Available Mbytes	n/a	> 100MB	Available Physical RAM on the box
SQL Server:Buffer Manager	Buffer Cache Hit Ratio	n/a	> 98%	Indication of how often SQL Server is finding data pages in the buffer pool
SQL Server:Buffer Manager	Page Life Expectancy	n/a	> 300	Indicates pressure on memory as data pages are flushed to disk – inducing IO
SQL Server:Buffer Manager	Lazy Writes/sec	n/a	< 20	Measure of when the lazy writer thread must clear up buffers to free space for pages – inducing IO
SQL Server:Buffer Manager	Free Pages	n/a	> 640	Measure of how many free pages are in the buffer pool for use
SQL Server:Wait Statistics	Page IO latch waits	Average wait time (ms)	None	Average wait time spent waiting on Page IO between disk and memory

Additionally, by storing scalar values returned from any of the queries in this post into a variable and executing those queries in a loop or a job, you can then make them available to Performance Monitor as a counter to plot them with other metrics. This allows you to get more granular detailed information and track IO waits and other data on a particular database, application, or session if required.

If you can quantify the information from a query as a single numeric value, then you can pass it out to Performance Monitor to enhance your analysis. For instance, you may choose to find out how many outstanding IOPs exist on SQL Server waiting longer than 10ms with:

select COUNT(*) from sys.dm_io_pending_io_requests where io_type = 'disk' and io_pending_ms_ticks > 10

Or you could map the time spent waiting on PAGEIOLATCH_% with:

select SUM(wait_time_ms) from sys.dm_os_wait_stats where wait_type like 'PAGEIOLATCH%'

For more information on setting up a custom performance counter, check here:

Creating a custom performance monitor counter for SQL Server

https://blogs.msdn.com/b/askjay/archive/2010/02/18/creating-a-custom-performance-monitor-counter-for-sql-server.aspx

SQL Server Dynamic Management Views (DMVs):

The following DMVs will return extensive information that will help determine and diagnose an IO problem as seen from within SQL Server:

To check for outstanding IOPs from within SQL Server waiting longer than 10ms, start with:

select * from sys.dm_io_pending_io_requests where io_type = 'disk' and io_pending_ms_ticks > 10

To get information on IO stalls (a wait for IO that can’t be completed immediately), and the amount of traffic per file in a database, use:

select * from sys.dm_io_virtual_file_stats(db_id(‘<ENTER YOUR DATABASE NAME HERE>’), NULL)

To see if IO is the primary waiter for SQL Server, execute the following and look for wait types with a naming convention like PAGEIOLATCH_%%:

select top 10 * from sys.dm_os_wait_stats order by wait_time_ms desc

To monitor PAGEIOLATCHes for increases in time, periodically check and store these:

select * from sys.dm_os_wait_stats where wait_type like 'PAGEIOLATCH%'

To capture current executing statements that are waiting on IO or just finished waiting on IO:

select

r.session_id,

s.login_name,

s.program_name,

r.start_time,

r.status,

r.command,

r.wait_type,

r.wait_time,

r.last_wait_type,

r.logical_reads,

(r.logical_reads * 8192) as 'KB Read',

r.writes,

(r.writes * 8192) as 'KB Written',

t.[text]

from sys.dm_exec_requests r

cross apply sys.dm_exec_sql_text(sql_handle) t

inner join sys.dm_exec_sessions s

on r.session_id = s.session_id

where s.is_user_process = 1 and

(r.wait_type like 'PAGEIOLATCH%' or r.last_wait_type like
'PAGEIOLATCH%') and

r.session_id != @@SPID

Finally, suspect I/O capacity or bandwidth can be verified with SQLIO and the I/O integrity checked with SQLIOSim:

SQLIO Disk Subsystem Benchmark Tool (SQLIO is a tool provided by Microsoft which can also be used to determine the I/O capacity of a given configuration.)

https://www.microsoft.com/downloads/details.aspx?familyid=9A8B005B-84E4-4F24-8D65-CB53442D9E19&displaylang=en

How to use the SQLIOSim utility to simulate SQL Server activity on a disk subsystem

https://support.microsoft.com/kb/231619

- Jay

Troubleshooting Slow Disk I/O in SQL Server

Additional resources