Troubleshooting Slow Disk I/O in SQL Server

If you suspect you are experiencing poor disk performance you can use internal DMVs combined with a Performance Monitor collection to get a good picture of the health of the disk I/O subsystem and any latency SQL Server is experiencing from its poor performance.  

 

Performance Monitor Counters:

Below is a list of counters and thresholds to watch for when monitoring IO on your SQL Server:

 

Edit 7/19/2011 - Please monitor the below counters and compare average values over time to the thresholds provided. The counters are intended for OLTP workloads and larger workloads such as OLAP and batch activity may not fit within these thresholds.

 

Object

Counter

Instance

Threshold

Comments

Physical Disk / Logical Disk

Avg. Disk sec/Read

*

( < .005 excellent; .005 - .010 Good; .010 - .015 Fair; > .015 investigate)

Measures read latency on the disks

Physical Disk / Logical Disk

Avg. Disk sec/Write

*

( < .005 excellent; .005 - .010 Good; .010 - .015 Fair; > .015 investigate)

Measures write latency on the disks

Physical Disk / Logical Disk

Avg. Disk sec/Transfer

*

( < .005 excellent; .005 - .010 Good; .010 - .015 Fair; > .015 investigate)

Measures average latency for read or write operations

Physical Disk / Logical Disk

% Idle Time

*

> 50%

The disk should not be working constantly

Physical Disk / Logical Disk

Avg. Disk Queue Length

*

None

Be careful when using old thresholds of 2 with SQL Server and SANs. This value can be much higher than 2. Do not rely solely on this counter to diagnose an IO problem.

Physical Disk / Logical Disk

Disk reads/sec

*

None

Number of read IOPs

Physical Disk / Logical Disk

Disk writes/sec

*

None

Number of write IOPs

Physical Disk / Logical Disk

Disk Transfers/sec

*

None

Number of read and write IOPs (used to compare against capacity of storage subsystem)

Physical Disk / Logical Disk

Disk Read Bytes/sec

*

None

Used for determine bandwidth utilization for Read Ops

Physical Disk / Logical Disk

Disk Write Bytes/sec

*

None

Used to determine bandwidth utilization for Write Ops

Physical Disk / Logical Disk

Disk Bytes/sec

*

None

Used to determine bandwidth total bandwidth utilization

Processor

% Processor Time

_Total

< 80%

Amount of total CPU usage across all processors

Processor

% User Time

_Total

< 80%

Amount of total CPU usage in user mode across all processors

Processor

% Privileged Time

_Total

< 30%

Amount of total CPU usage in kernel mode across all processors

Memory

Available Mbytes

n/a

> 100MB

Available Physical RAM on the box

SQL Server:Buffer Manager

Buffer Cache Hit Ratio

n/a

> 98%

Indication of how often SQL Server is finding data pages in the buffer pool

SQL Server:Buffer Manager

Page Life Expectancy

n/a

> 300

Indicates pressure on memory as data pages are flushed to disk – inducing IO

SQL Server:Buffer Manager

Lazy Writes/sec

n/a

< 20

Measure of when the lazy writer thread must clear up buffers to free space for pages – inducing IO

SQL Server:Buffer Manager

Free Pages

n/a

> 640

Measure of how many free pages are in the buffer pool for use

SQL Server:Wait Statistics

Page IO latch waits

Average wait time (ms)

None

Average wait time spent waiting on Page IO between disk and memory

 

Additionally, by storing scalar values returned from any of the queries in this post into a variable and executing those queries in a loop or a job, you can then make them available to Performance Monitor as a counter to plot them with other metrics. This allows you to get more granular detailed information and track IO waits and other data on a particular database, application, or session if required. 

 

If you can quantify the information from a query as a single numeric value, then you can pass it out to Performance Monitor to enhance your analysis. For instance, you may choose to find out how many outstanding IOPs exist on SQL Server waiting longer than 10ms with:

 

             select COUNT(*) from sys.dm_io_pending_io_requests where io_type = 'disk' and io_pending_ms_ticks > 10

 

Or you could map the time spent waiting on PAGEIOLATCH_% with:

 

select SUM(wait_time_ms) from sys.dm_os_wait_stats where wait_type like 'PAGEIOLATCH%'

 

For more information on setting up a custom performance counter, check here: 

 

Creating a custom performance monitor counter for SQL Server

https://blogs.msdn.com/b/askjay/archive/2010/02/18/creating-a-custom-performance-monitor-counter-for-sql-server.aspx

SQL Server Dynamic Management Views (DMVs):

The following DMVs will return extensive information that will help determine and diagnose an IO problem as seen from within SQL Server:

 

To check for outstanding IOPs from within SQL Server waiting longer than 10ms, start with:

 

select * from sys.dm_io_pending_io_requests where io_type = 'disk' and io_pending_ms_ticks > 10

 

To get information on IO stalls (a wait for IO that can’t be completed immediately), and the amount of traffic per file in a database, use:

 

select * from sys.dm_io_virtual_file_stats(db_id(‘<ENTER YOUR DATABASE NAME HERE>’), NULL)

 

To see if IO is the primary waiter for SQL Server, execute the following and look for wait types with a naming convention like PAGEIOLATCH_%%:

 

select top 10 * from sys.dm_os_wait_stats order by wait_time_ms desc

 

To monitor PAGEIOLATCHes for increases in time, periodically check and store these:

 

            select * from sys.dm_os_wait_stats where wait_type like 'PAGEIOLATCH%'

 

 

To capture current executing statements that are waiting on IO or just finished waiting on IO:

 

select

      r.session_id,

      s.login_name,

      s.program_name,

      r.start_time,

      r.status,

      r.command,

      r.wait_type,

      r.wait_time,

      r.last_wait_type,

      r.logical_reads,

      (r.logical_reads * 8192) as 'KB Read',

      r.writes,

      (r.writes * 8192) as 'KB Written',

      t.[text]

from sys.dm_exec_requests r

      cross apply sys.dm_exec_sql_text(sql_handle) t

      inner join sys.dm_exec_sessions s

      on r.session_id = s.session_id

      where s.is_user_process = 1 and

      (r.wait_type like 'PAGEIOLATCH%' or r.last_wait_type like
 'PAGEIOLATCH%') and

      r.session_id != @@SPID

 

 

Finally, suspect I/O capacity or bandwidth can be verified with SQLIO and the I/O integrity checked with SQLIOSim:

 

SQLIO Disk Subsystem Benchmark Tool (SQLIO is a tool provided by Microsoft which can also be used to determine the I/O capacity of a given configuration.)

https://www.microsoft.com/downloads/details.aspx?familyid=9A8B005B-84E4-4F24-8D65-CB53442D9E19&displaylang=en

 

How to use the SQLIOSim utility to simulate SQL Server activity on a disk subsystem

https://support.microsoft.com/kb/231619

- Jay