Should I run SQLIOSim? – An e-mail follow-up from SQL PASS 2008

From: Robert Dorr []
Sent: Thursday, November 20, 2008 11:29 AM
To: A SQL PASS MEMBER – CSS First Aid Station Question About SQLIOSim

As we discussed at SQL PASS – SQLIOSim is an independent utility and has been used as part of the Windows Hardware Compatibility Testing (WHCL) suite for several years.   Many of the major hardware manufacturers use it as part of their release criteria testing as well.    I recently worked with a major hardware vendor.   They were getting ready to release a new HBA controller series and SQLIOSim was reporting errors.   After some investigation the root cause was a firmware bug that would have lead to SQL Server corruption.  The firmware was corrected before the product was released.

SQLIOSim performs various I/O patterns.  These include ones that mimic various SQL Server I/O requests (read ahead, bcp, backup, dbcc, random writers and readers, …)   It also performs stress testing by posting un-throttled amounts of I/O and other random patterns.   This is why it is perfect for data consistency checking but cannot be used for performance testing.

I had numerous inquiries at PASS about the output of SQLIOSim and I provided the same guidance.   If you are getting errors you should consider it an I/O path problem that will lead to SQL Server corruption with the potential of total database loss.   Here is a quote from another customer:  “Our file system administrators don’t like us to run SQLIOSim because it puts heavy stress on the SAN.   However, the last 3 times we have attempted to add a new machine to production it has triggered an error because the machines where incorrectly configured.   I won’t put a machine in production if SQLIOSim does not report success.”

The easiest way to make SQLIOSim completes with success is to set the configuration option to STOP ON ERROR.   This way if a data integrity error is encountered the utility logs extended details and shutdown.

For your specific issue I took a look at the page dumps and you really need to revisit the hardware and I/O path configuration.

If you have not read my white papers on “SQL Server I/O Basics” Chapter 1 and Chapter 2 please take the time to do so.   It outlines how a stale reads, lost writes, torn pages, checksum failures and other I/O failure conditions are detected and what this means to the data.

SQLIOSim used the secure crypto APIs to create random data patterns on the pages.    On the page is a small header containing the page id and the expected checksum value.  These are used to help determine the state of the page.

Diag File



Classic Stale Read Pattern


The PageId between on expected page image and the page read from disk is the only data that is the same.   All other data is different.   This is usually a condition where the hardware returned a stale image of the page from cache – firmware bug.


NOTE: We re-read this from disk 15 times and still received the older buffer.  Probably from hardware cache.


Data mismatch between the expected disk data and the read buffer:

File: F:sqliosim.mdf

Offset: 0xD2F9CE000

Expected FileId: 0x0

Received FileId: 0x0

Expected PageId: 0x697CE7

Received PageId: 0x697CE7

Expected CheckSum: 0xA6DDED06

Received CheckSum: 0x76E039CE (does not match expected)

Calculated CheckSum: 0x76E039CE

Expected Buffer Length: 0x2000

Received Buffer Length: 0x2000

Synchronous read was not successful after 15 attempts


Classic Stale Read Pattern


Classic Stale Read Pattern


Classic Stale Read Pattern


Classic Stale Read Pattern

To help track down the root cause of a stale read issue we often disable the hardware I/O cache.  If this the test runs successfully it is the handling of the hardware cache which should be investigated closely.

Bob Dorr
SQL Server Principal Escalation Engineer

Comments (5)

  1. StephenB says:

    I am getting an error on my windows 2008 r2 server, where no matter what drive I look to create files via sqliosim config, I get "unable to open <drive leter> to determine the sector size."

    All drives were formatted with NTFS 64k block size.?

    perhaps sqliosim is not supported on win 2008 r2?

  2. StephenB says:

    The solution is to run sqliosim as Adminstrator on win 2008 r2.  Even though I am a local admin on the box, I must use "run as".  Weird.

  3. Pete says:

    Interesting point of view. For more info I also recommend