SQL Server I/O Bottleneck, I don't have one, YES YOU DO!

The mistake I see people make is when looking at the SQL Server PAGE I/O waits and stalled I/O warnings is when comparing it to the Avg. Disk Seconds/Transfer.    Everyone seems to forget that average means average and they look at the value and say I don't see any I/O taking longer than 15 seconds.

Take the following example over a 5 minute window.

2000 I/Os at 8ms   = 16000ms

1 I/O at 15000ms   = 15000 ms

TOTAL TIME: 31000ms

AVG: 15.49ms (31000 / 2001 I/O Requests)

You are not going to see a spike to 15000ms in the Avg Disk Sec/Transfer but instead a small change from the norm that is very hard to detect from just the Avg Disk Sec/Transfer.

Remember to pay close attention to the following SQL Server data points to help determine your I/O response rates.

  • PAGE I/O* Waits   (select * from sys.dm_os_wait_stats where wait_type like '%PAGEIO%')
  • Virtual File Stats (select * from sys.dm_io_virtual_file_stats(-1, -1))
  • Stalled I/O Warnings
  • Additional disk based performance counters and available hardware utilities

Don't forget that SQL Server drives read ahead, checkpoint and other I/O patterns to depths beyond a disk queue length of 2 and this is expected.

Bob Dorr
SQL Server Senior Escalation Engineer