Should I run SQLIOSim? - An e-mail follow-up from SQL PASS 2008
From: Robert Dorr [mailto:rdorr@microsoft.com]
Sent: Thursday, November 20, 2008 11:29 AM
To: A SQL PASS MEMBER - CSS First Aid Station Question About SQLIOSim
As we discussed at SQL PASS – SQLIOSim is an independent utility and has been used as part of the Windows Hardware Compatibility Testing (WHCL) suite for several years. Many of the major hardware manufacturers use it as part of their release criteria testing as well. I recently worked with a major hardware vendor. They were getting ready to release a new HBA controller series and SQLIOSim was reporting errors. After some investigation the root cause was a firmware bug that would have lead to SQL Server corruption. The firmware was corrected before the product was released.
SQLIOSim performs various I/O patterns. These include ones that mimic various SQL Server I/O requests (read ahead, bcp, backup, dbcc, random writers and readers, …) It also performs stress testing by posting un-throttled amounts of I/O and other random patterns. This is why it is perfect for data consistency checking but cannot be used for performance testing.
I had numerous inquiries at PASS about the output of SQLIOSim and I provided the same guidance. If you are getting errors you should consider it an I/O path problem that will lead to SQL Server corruption with the potential of total database loss. Here is a quote from another customer: “Our file system administrators don’t like us to run SQLIOSim because it puts heavy stress on the SAN. However, the last 3 times we have attempted to add a new machine to production it has triggered an error because the machines where incorrectly configured. I won’t put a machine in production if SQLIOSim does not report success.”
The easiest way to make SQLIOSim completes with success is to set the configuration option to STOP ON ERROR. This way if a data integrity error is encountered the utility logs extended details and shutdown.
For your specific issue I took a look at the page dumps and you really need to revisit the hardware and I/O path configuration.
If you have not read my white papers on “SQL Server I/O Basics” Chapter 1 and Chapter 2 please take the time to do so. It outlines how a stale reads, lost writes, torn pages, checksum failures and other I/O failure conditions are detected and what this means to the data.
https://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx
https://www.microsoft.com/technet/prodtechnol/sql/2005/iobasics.mspx
SQLIOSim used the secure crypto APIs to create random data patterns on the pages. On the page is a small header containing the page id and the expected checksum value. These are used to help determine the state of the page.
Diag File |
Condition/Characteristics |
SqlSimErrorDump01863 |
Classic Stale Read Pattern
The PageId between on expected page image and the page read from disk is the only data that is the same. All other data is different. This is usually a condition where the hardware returned a stale image of the page from cache – firmware bug.
NOTE: We re-read this from disk 15 times and still received the older buffer. Probably from hardware cache.
Data mismatch between the expected disk data and the read buffer: File: F:sqliosim.mdf Offset: 0xD2F9CE000 Expected FileId: 0x0 Received FileId: 0x0 Expected PageId: 0x697CE7 Received PageId: 0x697CE7 Expected CheckSum: 0xA6DDED06 Received CheckSum: 0x76E039CE (does not match expected) Calculated CheckSum: 0x76E039CE Expected Buffer Length: 0x2000 Received Buffer Length: 0x2000 Synchronous read was not successful after 15 attempts |
SqlSimErrorDump01864 |
Classic Stale Read Pattern |
SqlSimErrorDump01864 |
Classic Stale Read Pattern |
SqlSimErrorDump01888 |
Classic Stale Read Pattern |
SqlSimErrorDump01909 |
Classic Stale Read Pattern |
To help track down the root cause of a stale read issue we often disable the hardware I/O cache. If this the test runs successfully it is the handling of the hardware cache which should be investigated closely.
Bob Dorr
SQL Server Principal Escalation Engineer