SQL Server 2005 introduced a background worker to flush trace event streams. The trace buffer is flushed when it becomes fully populated but a partially populated trace buffer remains in-memory until events fill the buffer or the background worker flushes the events to the stream. Periodically the background worker wakes up, checks outstanding trace buffers and flushes them as required.
A small 17883 trend has surfaced regarding the trace output taking too long and stalling a scheduler. The issue occurs when the background worker attempts to flush a buffer and the IO subsystem stalls.
Using the public symbols for the SQL Server the stack would look like the following.
sqlservr!DiskWriteAsync+0xee ß This should take very little time but a stalled scheduler indicates it took 60+ seconds
The trace file location can make a big difference.
- The always on trace, controlled by the sp_configure value 'default trace enabled', is placed in the LOG directory.
- Other trace destinations are controlled at the time the trace is defined. You want to use high speed disk locations and not single disks and UNC paths.
The stack shown above was from a trace to a UNC path (bad idea) and the network had stalled preventing the IO from going asynchronous, putting the scheduler into a stalled state. The Scheduler Monitor detected the situation and generated the 17883 warning.
To help avoid a stall make sure trace destinations are high speed disk subsystem that won't encounter IO stall conditions. This may include moving the LOG directory so the default trace does not remain on the same drive as the page file as an example.
To move the LOG directory. (Use this technique with caution and test the technique before using in production.)
- Create a new destination path. The path must exist.
- Change the SQL Server startup parameter (-e) to use the new path instead of the old path.
- Restart the SQL Server
SQL Server Principal Escalation Engineer