The mistake I see people make is when looking at the SQL Server PAGE I/O waits and stalled I/O warnings is when comparing it to the Avg. Disk Seconds/Transfer. Everyone seems to forget that average means average and they look at the value and say I don't see any I/O taking longer than 15 seconds.
Take the following example over a 5 minute window.
2000 I/Os at 8ms = 16000ms
1 I/O at 15000ms = 15000 ms
TOTAL TIME: 31000ms
AVG: 15.49ms (31000 / 2001 I/O Requests)
You are not going to see a spike to 15000ms in the Avg Disk Sec/Transfer but instead a small change from the norm that is very hard to detect from just the Avg Disk Sec/Transfer.
Remember to pay close attention to the following SQL Server data points to help determine your I/O response rates.
- PAGE I/O* Waits (select * from sys.dm_os_wait_stats where wait_type like '%PAGEIO%')
- Virtual File Stats (select * from sys.dm_io_virtual_file_stats(-1, -1))
- Stalled I/O Warnings
- Additional disk based performance counters and available hardware utilities
Don't forget that SQL Server drives read ahead, checkpoint and other I/O patterns to depths beyond a disk queue length of 2 and this is expected.
SQL Server Senior Escalation Engineer