How It Works: SQLIOSim - Running Average, Target Duration, Discarded Buffers ...

The following is from a lengthy conversation I had with Kevin Kline. I believe he will be posting more details along with his "Understanding SQLIOSim Output - https://sqlblog.com/blogs/kevin_kline/archive/2007/06/28/understanding-sqliosim-output.aspx)

 

********** Final Summary for file E:\sqliosim.mdx **********

File Attributes: Compression = No, Encryption = No, Sparse = No

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 17, Number of times IO throttled = 9006, IO request blocks = 33

Reads = 75414, Scatter Reads = 88286, Writes = 3295, Gather Writes = 54845, Total IO Time (ms) = 44936446

Running Average IO Duration (ms):    This value is stored in the variable m_dwRunningAvgDuration. The Running Average IO Duration is response time in Milliseconds that was noticed during the life of the stress test. The running average duration is initially set to the target duration at the time the logical file is created which is 100ms. The values are updated in ReturnIOFromCompletion which is called as soon as an IO has completed. If the duration of any IO exceeds 15 seconds, a warning is printed. The running average is not exactly what you might think of when you think of average. It is similar to that used in the Checkpoint logic of SQL Server. We take the current average * the outstanding IOs -1 and add the duration of the current IO completing to it and divide. This keeps the target from getting a big swing because one I/O takes a long time. If you use a standard average approach like current average + duration a single bad entry swings the value to an extreme quickly. Here is the formula used to set the Running Average Duration.

 

// dwDuration is duration in ms of the IO that was completed.

m_dwRunningAvgDuration = ((m_dwRunningAvgDuration * (MaxIOs - 1)) + dwDuration) / MaxIOs;

 

Here is a table illustrating how Running Average Io Duration is calculated. As we can see, the standard Average calculation is affected severely with one IO ( Row # 4) taking long time to complete which is avoided by using the formula as above to avoid the big swing.

IO request #

Duration of the IO in ms. (dWDuration)

Cumulative Duration of all IOs.

Cumulative IO requests Completed.

Running Average IO Duration ( m_dwRunningAvgDuration)

Standard Average Calculation.

1

100

100

1

100

100

2

200

300

2

103.125

150

3

400

700

3

115.625

233

4

15000

15700

4

580.76

3925

5

100

15800

5

565.73

3160

We use the average value to control the number of IO requests that can be outstanding. We do this by adding or removing IO request from the IO Request list. If we meet or beat the target we add another request that can be outstanding. At the completion of every IO request the running average is updated and compared to the target. If the average exceeds the target then the IO request block is removed and the throttle count is incremented; otherwise it is given out to the next waiter, and if there is another waiter then another block is given out.

 

A large number of throttled requests usually stems from the fact that at certain points of the test throttling is disabled which results in posting of thousands of simultaneous requests. As soon as throttling is re-enabled all those requests get removed according to the rules above. This behavior is controlled by the AllowIOBursts parameter.

  

ADDITIONAL RESEARCH

===========================================================

I went back and looked closer at the "Running Average IO Duration." Tthe Target value and formula as outlined and the throttling is as described. 

    static const long MinIOs = 1;
   static const long MaxIOs = 32;<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> 

The MaxIOs limit the degree of the average we are using but this is not the limit for the how much I/O per file can be outstanding. The running average is used for this so in following message 45 I/Os for this file could be posted, right now and the response is expected to be less than 100ms. However, the number of total I/Os is capped by the max outstanding I/O configuration settings. So it is possible this file could keep more than 45 I/Os outstanding but the overall limit of I/Os is in play for all the files. (MaxOutstandingIO)

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 0, Number of times IO throttled = 13718, IO request blocks = 45

So in the message above it would appear that at some point during the test 13718 I/Os took longer than 100ms for the file. However, 1000s of others completed in less than that time and those are probably very fast. So if those I/Os are spread out the average won't move that much. Say we have a running average of 1ms and we encounter an I/O at 120m, this results in ((1 * 31) + 120) / 32 = 4.7ms. We don't swing the average that much. So if I can complete I/Os quickly but I can't get more I/O requests SQLIOSim will continue to lower the average and since this is integer division you end up with an average duration of 0ms.

Complicating the understanding is the use of GetTickCount for timing. You can read my other blogs on RDTSC if you want the gory details. So the granularity of this becomes 10 to 15ms on most systems and could be up to 50ms on others. This is one of the reasons we use a 100ms target.  

Number of times IO throttled: The Number of times IO throttled represents the number of times we removed a request because the duration exceeded and the running average IO duration was going above the target IO duration. For example, consider the following output.

 

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 121, Number of times IO throttled = 1167, IO request blocks = 13

 

In this case, we used the default target IO Duration of 100 but the Running Average IO Duration was 121 which exceeded the target by 21ms for this file and so we had to throttle the IO 1167 times.

 

The condition that is used to increment the counter is

 

      if (m_dwRunningAvgDuration > TargetIODuration && m_lRequestNodes > MinIOs) //MinIos is 1

 

IO request blocks: The Io Request blocks indicate the number of IO request blocks the logical file has at the time of report (or at the time test ended). The number of IO Request blocks also shows the number of concurrent outstanding IO requests the file had to meet the specified target IO duration. For the following output we only had 13 requests available. It is not uncommon to have 100 or more IOs for a file in SQL Server. On Enterprise Edition we can issue lots of I/O for read ahead and such.

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 121, Number of times IO throttled = 1167, IO request blocks = 13

 

Reads: The counter that tracks this is m_lReads. This value is initialized to 0 when the logical file is created and is incremented in FPostRead. This counter is incremented each time a request to read a buffer is posted. This counter does not include the Scatter Reads.

 

Scatter Reads: The counter that tracks this is m_lScatterReads. This value is initialized to 0 when the logical file is created and is incremented in FPostScatterRead. This counter is incremented each time a request to read a buffer is posted. This counter does not include the Non Scatter Reads.

  

Writes:   The counter that tracks this is m_lWrites. This value is initialized to 0 when the logical file is created and is incremented in FPostWrite. This counter is incremented each time a request to write a buffer is posted. This counter does not include the Gather Writes.

 

Gather Writes: The counter that tracks this is m_lGatherWrites. This value is initialized to 0 when the logical file is created and is incremented in FPostGatherWrite. This counter is incremented each time a request to read a buffer is posted. This counter does not include the Non Gather Writes.

 

Total IO Time (ms): The total IO Time shows the cumulative time taken by all IO operations on the file. If we divide the Total IO Time with the total number of IOs(reads+writes_scatterreads+gatherwrites) we will get the traditional average (sum/total) type average over the period of the whole test where as the Running Average IO Duration is only for the last 32 requests as discussed in previous sections.

Validation

The following message is logged when the buffer pool is shutdown.

Buffer Pool: validated buffers 327317, pages 438912, discarded buffers 375356

Validated buffers: The buffer pool for SQLIOSim is similar to that in the SQL Server proper. Each 8K page in memory has an overhead structure (BUF) that has the status and other tracking information. These are also the used as part of the hashing mechanisms. Validating a buffer means that the buffer status matches the page. For example the page number in the buffer matches that on the page. Commonly seen in SQL Server as a 605 error when a mismatch is detected.

Validated pages:   This is thenumber of data pages that have been validated for correct content during read IO completion. The checksum and other checks are performed in much the same way that SQL Server checks for 823 error conditions.

Discarded buffers:   These are the number of buffers that have been removed from the buffer pool but are not validated and were not directly dirtied. This can occur as part of a read ahead, just like it does in SQL and I outline in my I/O whitepapers. Say we want to read in pages 1, 2 and 3. However, page 2 is already in the buffer pool. We don't issue 2 I/Os. Instead we issue a single I/O for pages 1, 2 and 3 but we immediately discard page 2. This is a very good counter because it shows the original test that SQLIOStress was designed for - Stale Reads from hardware cache.

 

Reference: The following outlines the target duration a bit from the SQL Server engine: https://blogs.msdn.com/psssql/Default.aspx?p=3

Bob Dorr

SQL Server Principal Escalation Engineer