SQLIOSim – Outstanding I/O Warnings


I have encountered some questions pertaining to the SQLIOSim outstanding IO warnings and I am posting some details.

 

The Display monitor thread is responsible for updating the display (progress outputs) and checking for long IOs  (CheckForLongIO/ReportLongIo).   Each time it wakes (from 1 to 10 second range depending on state) it fires the check.

 

Each logical CPU (scheduler) in the simulator has an IO list.   The CheckForLongIO loops over each logical CPU and checks the pending IO list of each logical scheduler.

 

When the IO is posted (async) we grab the current time and add the IO to the logical scheduler specific list.  When the check is performed each IO in the list is checked with HasOverlappedIoCompleted.   This is a WIN32 macro that checks the status stored in the OVERLAPPED structure of the IO to see if the system has reported the IO as complete.  When FALSE is returned and the duration of the IO is > 15 seconds the condition is flagged as a ‘stalled’ IO as the OS and sub-system have not completed the specific IO.

 

The number of these outstanding IOs is accumulated and reported after all logical CPUs have been checked.

 

The following output shows 2 logical CPUs being used by the simulator.

 

03/10/07

15:35:58

3264

System

CPUCount = 2, Affinity = 0, IOAffinity = 0, MaxMemory = 630 MB, StopOnError = Yes, TestCycles = 1, TestCycleDuration = 300 s, CacheHitRatio = 100.00%

 

At 15:36:46 you have 5 IO requests that exceeded 15 seconds.  One second later another 72 requests exceeded the 15 second mark.  (The same IO is not reported again, once flagged as long it won’t be counted on secondary passes.)

 

What this does not show is how many IOs have completed.  SQLIOSim could have completed 1000s of IOs during this 15 second window but some of the requests are being stalled. 

 

03/10/07

15:36:45

3764

Display Monitor

Error: 0x00000000
Error Text:
Description: 5 IO requests are outstanding for more than 15 sec.

03/10/07

15:36:46

3764

Display Monitor

Error: 0x00000000
Error Text:
Description: 72 IO requests are outstanding for more than 15 sec.

 

The files are open with NO buffering so they should not involve system cache components.  You should review filter drivers and the like.  Likely something like elevator seek or something of this nature is at play when you see the outstanding IO warnings.

 

 

03/10/07

15:35:58

3264

System

NoBuffering = Yes, WriteThrough = Yes, UseScatterGather = Yes, ForceReadAhead = No, MaxOutstandingIO = 0, TargetIODuration = 100 ms

LOGICALCONFIG::PrintConfigurationToLog

e:yukonsosbranchsqlntdbmsstorengutilsqliosimlogicalconfig.cpp

228

 

03/10/07

15:35:58

3264

System

AllowIOBursts = Yes, DeleteFilesAtStartup = Yes, DeleteFilesAtShutdown = No, StampFiles = No

     

 

The following error text is reported for each, individual file no more than every 30 seconds, indicating the number of total IOs identified as taking longer than 15 seconds.

 

The following indicates that 826 IO requests for the file sqliosim.mdx exceeded 15 seconds to complete in the last 30 second processing window.   The last IO that is currently STATUS_PENDING (256) is at the specified offset on the specified handle.

 

 

03/10/07

15:37:16

3764

Display Monitor

Comments (8)

  1. Kevin Kline says:

    One of the things that’s great about my job is the fact that I get to meet people who are a lot smarter

  2. abhigupta22 says:

    I had run the test for HP MSA drive wiht one local drive. But for MSA drvie it is printing following errors:-

    19/07 12:43:56 11688 CreateFileStream Error: 0x80070032

    Error Text: The request is not supported.

    Description: Unable to get disk cache info for Z:

  3. psssql says:

    The utility attempts to check the caching state.   When it can’t it logs an error.  This should be a warning and will be updated in future versions.

    [RDORR Jan 24, 2008] This is correct and is already updated in the SQL 2008 CTP builds.

  4. Vivien Xing says:

    "[RDORR Jan 24, 2008] This is correct and is already updated in the SQL 2008 CTP builds."

    In SQL2008 on Windows 2008, I still get a similar error message from "CreateFileStream":

       Error: 0x80070001

       Error Text: Incorrect function.

       Description: Unable to get disk cache info for C:

    Version: 2007.100.1600.22

    Is it safe to say that it is just a warning? not an error?

  5. babuganesh2000 says:

    According to this post, "The CheckForLongIO calls ReportLongIO but ReportLongIO has the 30 second reporting throttle." In my case it looks like it is in the reverse order. Can you please help me to understand is this is a potential problem.

    10:47:09 Description: 1 long IO requests are outstanding for file E:sqliosim.mdx. Handle 0x598. Offset of latest long IO 0xab5f40000, duration 15366 ms CLogicalFile::ReportLongIO

    10:47:09 Description: 298 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO

    10:47:11 Description: 385 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO

    10:47:13 Description: 282 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO

    10:47:14 Description: 915 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO

    10:47:15 Description: 830 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO

    10:47:16 Description: 723 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO

    10:47:17 Description: 1446 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO

    10:47:18 Description: 101 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO

  6. chicago medical billing says:

    wonderful blog thanks for sharing such useful post.

    http://www.billergeeks.com

  7. Crystal limousine says:

    Thanks for sharing this information, its quite helpful.

    http://www.crystal-limousine.com

  8. Tim Myers says:

    It's a world of SQL pain out there sometimes trying ot interpret these results; I've recently experienced issues with these long I/O timeouts in a client's SharePoint environment, here's a summary of my findings and thoughts:

    http://www.boxesandarrows.co.uk/testing-sql-io-for-sharepoint