I had a very specific question asked of me related to the SQLIOSIM.exe, checksum validation logic. It is pretty simple logic (on purpose) but effective so here are the basics.
The key is that there are multiple memory locations used to hold the data and do the comparison.
1. Allocate a buffer in memory of 8K.
Stamp the page with random data using Crypto Random function(s)
Save Page Id, File Id, Random seed and calculated checksum values in the header of the page
2. Send the page to stable media (async I/O)
Check for proper write completion success
3. Sometime after successful write(s). Allocate another buffer and read the data from stable media.
(Note: This is a separate buffer from that of the write call)
4. Validate the bytes read.
Do header checks for file id, page id, seed and checksum values
Expected CheckSum: 0xEFC6D39C ———- Checksum stored in the WRITE buffer
Received CheckSum: 0xEFC6D39C ———- Checksum stored in the READ buffer (what stable media is returning)
Calculated CheckSum: 0xFBD2A468 ——— Checksum as calculated on the READ buffer
|The detailed (.TXT) output file(s) show the WRITE image, the READ image and the DIFFERENCES found between them (think memcmp). When only a single buffer image is added to the detailed TXT file this indicates that the received header data was damaged or the WRITE buffer is no longer present in memory so only the on disk checksum vs calculated checksum are being compared.|
|If there appears to be damage SQLIOSim will attempt to read the same data 15 more times and validate before triggering the error condition. Studies from SQL Server and Exchange showed success of read-retries in some situations. SQL Server and Exchange will perform up to 4 read-retries in the same situation.|
The window for damage possibilities is from the time the checksum is calculated to the time the read is validated. While this could be SQLIOSim the historical evidence shows this is a low probability. The majority of the time is in kernel and I/O path components and the majority of bugs over the last 8 years have been non-SQL related.
For vendor debugging the detailed TXT file contains the various page images as well as the sequence of Win32 API calls, thread ids and other information. Using techniques such as a bus analyzer or detailed I/O tracing the vendor can assist at pin-pointing the component causing the damage.
The top of the 8K page header is currently the following (Note: This may change with future versions of SQLIOSim.exe)
DWORD Page Number (Take * 8192 for file offset)
DWORD File Id
DWORD Seed value
DWORD Checksum CRC value
BYTE Data[8192 – size(HEADER)] <——— Checksum protects this data
Bob Dorr – Principal SQL Server Escalation Engineer