I/O requests taking longer than 15 seconds to complete on file

SQL Server I/O Bottlenecks ( I/O requests taking longer than 15 seconds to complete on file )

New version of this post in available in This link

 

Do you see warnings like one below in your SQL Server error log?

SQL Server has encountered x occurrence(s) of I/O requests taking longer than 15 seconds to complete on file .

The OS file handle is 0x000006A4. The offset of the latest long I/O is: 0x00000

This indicates SQL Server I/O Bottlenecks. SQL Server performance highly relies on the Disk performance. SQL Server I/O Bottleneck can be identified through

1. PAGEIOLATCH_xx or WRITELOG wait types in Sys.Sysprocesses and other DMV's

2. I/O taking longer than 15 seconds in SQL Server Error log.

{

SQL Server has encountered X occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [ ] in database [IOTEST (7). The OS file handle is 0x000006A4. The offset of the latest long I/O is:
0x000001

}

3. By looking at I/O latch wait statistics in sys.dm_os_wait_stats

{

Select  wait_type,         waiting_tasks_count,         wait_time_ms  from    sys.dm_os_wait_stats where    wait_type like 'PAGEIOLATCH%'
order by wait_type

}

4. By looking at pending I/O requests and isolating the disks,File and database in which we have I/O Bottleneck.

{

select     database_id,     file_id,     io_stall,     io_pending_ms_ticks,     scheduler_address from    sys.dm_io_virtual_file_stats(NULL, NULL)t1,         sys.dm_io_pending_io_requests as t2
where    t1.file_handle = t2.io_handle

}

Following are common reasons for I/O Bottleneck in SQL Server.

1. SQL Server is spawning more I/O requests than what I/O disk subsystem could handle.

2 . There could be an Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem (or) Compression and  so the Disks are performing very slow and hence SQL Server is affected.

3. Some other process on the system is saturating the disks with I/O requests. Common application includes AV Scan,System Backup Etc. So I/O requests posted by SQL Server becomes slow.

 

How to troubleshoot?

1.  Exclude SQL Server files from antivirus scan.

2. Do not place SQL Server FILES on compressed drives.

3. Distribute SQL Server data files and transaction log files across drives.

4. If the “I/O request taking longer” warning is for tempdb , Enable trace flag 1118 and increase the tempdb data files refer:https://support.microsoft.com/kb/2154845

5. If none of the above resolves the issue collect the below perfmon counters.

 

Perfmon counters can help us in understanding “If disk is slow” or  “SQL Server is spawning more I/O then what disk could handle” or “Some other process is saturating disk with I/O”

Note:It is important to get  throughput of the disk subsystem in MB/SEC before we look at disk counters. Normally it will be more than 150 MB for SAN disk and greater 50 MB for Single disk .When you look at the perfmon counter look at Max value.

 

Avg. Disk sec/Transfer --> Time taken to perform the I/O operation

Ideal value for Disk sec/Transfer is 0.005-0.010 sec. If you consistently notice this counter is beyond 0.015 then there is a serious I/O bottleneck.

1. Look for Disk Bytes /sec when Avg. Disk sec/Transfer  is greater than 0.015. If it is below 200 MB for SAN disk and Below 50 MB for Single disk then the problem is with I/O subsystem Engage hardware vendor.

2. If the Disk Bytes /sec  is greater than  200 MB for SAN disk or greater than 50 MB for Single disk when the  Avg. Disk sec/Transfer  is greater than 0.015. Look at the Process:IO Data Bytes/Sec for the same time and identify which process is spawning I/O. If the identified process is not SQL Server involve the team which supports that process. If the  the identified process is SQL Server tune SQL Server queries which are I/O intensive by creating dropping indexes etc.

 

Disk Bytes /sec  --> Total read and write to disk per second in bytes.

Collect the values for each logical disks in which SQL Server files are placed and look at the Max value for this counter ideally it has to be greater than the throughput of the disk subsystem. If you don’t have the throughput for the disk then this value to be greater than 200MB for SAN or greater than 50 MB for single disk.

If it is below the expected value you can consider that your disks are not performing well. Involve the hardware vendor. 

Important: Value for this counter will be low when there is no I/O happening on the drives. So you have to look at the this counter during the time you see I/O warnings or When Disk sec/Transfer >0.010 for the same drive. 

 

Process:IO Data Bytes/Sec --> Total read and write to disk per second in bytes by each process.

Collect this counter for all the processes running on the server. This counter will help us understand if any other process is saturating the disk with excessive I/O.

Example: Let us consider a disk with max throughput of 250MB per second. If antivirus is spawning 200MB of I/O per second and if SQL Server data files are placed in same drive and SQL Server is spawning 150MB obviously there will be I/O waits.

 

Buffer Manager: Page Read/sec + Page Writes/sec -->Total read and write to disk per second in bytes by SQL Server process.

 

  

If (Disk sec/Transfer > ==0.015 Consistently) and ( (Disk Bytes /sec < 150MB (For San)) or (Disk Bytes /sec < 50MB (For Local) or (Disk Bytes /sec < Speed of disk as per Vendor ))

{

There is Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem.

}

If (Disk sec/Transfer > ==0.015 Consistently) and ( (Disk Bytes /sec >= 150 (For San)) or (Disk Bytes /sec >= 50MB (For Local) or (Disk Bytes /sec >= Speed of disk as per Vendor ))

{

Identify the process which is posting excessive I/O request using Process:IO Data Bytes/Sec.

If ( Identified process == SQLServer.exe )

{

Identify and tune the queries which is Spawning excessive I/O.

(Reads+Writes column in profiler, Dashboard reports or sys.dm_exec_query_stats and sys.dm_exec_sql_text

can be used to identify the query). Use DTA to tune the query

}

If ( Identified process != SQLServer.exe )

{

Engage the owner of application which is spawning excessive I/O

}

}

Many thanks to Joseph Pilov from whom I learned many techniques like the one above.

 

If you liked this post, do like us on FaceBook at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K  |My Facebook Page |My Site| Blog space| Twitter