What is WRITELOG waittype and how to troubleshoot and fix this wait in SQL Server

<<This blog has been updated on 12/23/2014>>

What does WRITELOG waittype mean?

SQL Server Transaction Log entries are first written serially to in-memory structure called Log block and due to various conditions listed below in this blog, then the entire log block is flushed to disk. The size of this log block can be between 512 bytes and 64 kilobytes. For more information about the size limit of different types of I/O performed by SQL Server, visit https://blogs.msdn.com/b/sqlcat/archive/2005/11/17/493944.aspx

Now, WRITELOG waits represents the time, SQL Server is waiting on the I/O to complete after a Log block in cache is flushed from Memory to Disk (Thus why having a ultra fast disk like Flash based disks to store your LDF can speed up this I/O and it reduces your transaction latency)

Before we continue further, please read an Excellent article from SQL Customer Advisory Team at https://blogs.msdn.com/b/sqlcat/archive/2013/09/10/diagnosing-transaction-log-performance-issues-and-limits-of-the-log-manager.aspx which talks about a throttle limit called "Outstanding I/O limit" and "Outstanding log I/O Limit".  So this is another condition where you will notice WRITELOG wait type when outstanding I/O limit is reached.

So, given below are the conditions (but not limited to) when a log block will be flushed to disk:

1. COMMIT XSN (or ABORT XSN)

2. CHECKPOINT

3. LOG block is full

4. Outstanding I/O limit or Outstanding log I/O Limit reached

Transaction commits when you run a COMMIT TRAN (incase of Explicit transaction) or whenever SQL Server commits the transaction on behalf of you (Auto commit)

It is important to note that multiple active transactions can be interleaved in the log block, and if any one transaction commits, the buffer memory is written to the transaction log file by the log writer. No need to worry about what happens to data integrity when multiple transactions interleave at same time because SQL Server implements synchronization objects like spinlocks, mutex to control the access to log buffer.

To monitor your database log buffer flush and it's size, refer the the perfmon counters given below:

SQLServer: Database object counter Displays
Log Bytes Per Flush Number of bytes in the log buffer when the buffer is flushed.
Log Flushes/sec Number of log flushes per second.
Log Flush Wait Time Total wait time (milliseconds) to flush the log.
Log Flush Waits/sec Number of commits per second that are waiting on log flush.

So, now you know that when you see WRITELOG wait type in your system, all the wait time shown for this wait type is time spent by SQL Server waiting for the Log I/O to finish and anything you can do to improve the performance of transaction log I/O will reduce the wait time. Also, remember that having multiple transaction log files for one database doesn't help here because log I/O is a serial operation. Given that WRITELOG is a disk based wait, you can do some design changes so you make SQL Server to do less LOG I/O, one such design change is given below.

LOG I/O depends on whether you choose IMPLICIT or EXPLICIT Transaction:

During a troubleshooting scenario with one of Microsoft customer, where INSERT'ing 60000 records to a 2 column sample table took 8 minutes on a specific server while the same operation takes ,< 20 seconds in two other identical server. I was asked to to proove that it is a Disk problem and not a configuration issue within SQL Server that is different between these 3 servers.

This insert was running in a WHILE loop without any EXPLICIT Transaction. When I looked into the customer wait Stats (using sys.dm_os_wait_stats), WRITELOG was the top wait type in terms of total wait time on the server where this command is taking 8 minutes.

I explained about the WRITELOG wait type and conditions when LOG Blocks will be flushed to disk to the Customer.

To show a demo about difference between Autocommit and EXPLICIT transaction which changes the behavior of Log Flush, changed his code:

Before:

WHILE @i < 60000

Begin

INSERT INTO table values (1,'Name')

End

After:

BEGIN TRAN

WHILE @i < 60000

Begin

INSERT INTO table values (1,'Name')

End

COMMIT TRAN

Second batch with EXPLICIT transaction completed in just 2 seconds (compared to 8 mins when Auto commit transaction) in the same problematic environment. The reason is because every Auto Commit is flushing the transaction log buffer to disk for every INSERT but in the case of EXPLICIT XSN, log buffers are flushed only when they are full or due to any of the 4 conditions mentioned above, so the less log block flush, less the time we wait for I/O which decreases transaction latency. (Note that this was a test environment so almost, no other requests were running at that time)

Also we knew that the comparison might be skewed if there is a autogrowth happening in the background so we ensured that Initial Size for LDF was set before the testing started.

There is another difference in the case of EXPLICIT XSN which we noticed:

1. In Auto commit XSN, CHECKPOINT will run quite often and will complete soon because the amount of dirty buffers will be less when it runs so often

2. In EXPLICIT XSN, CHECKPOINT will run after a longer interval and it will run for longer duration because it has more work to do because of huge backlog of dirty buffers so you will see a spike in CHECKPOINT I/O if you monitor using Perfmon counters

 

Now, it made sense to the customer that instead of 60,000 LOG I/O, we just be sending <10 LOG I/O to disk when we use Explicit transaction which made transaction to complete sooner.

DMV sys.dm_tran_database_transactions helped us to understand WRITELOG wait time for every transaction w.r.t  the transaction begin time. Also sys.dm_io_pending_io_requests helped us to understand the outstanding I/O and whether the I/O is still pending from Disk subsytem or is it pending from OS and how long it is pending.

Please refer to the blog https://blogs.msdn.com/b/sql_pfe_blog/archive/2013/03/28/observing-sql-server-transaction-log-flush-sizes-using-extended-events-and-process-monitor.aspx to understand how Extended events can help you to monitor the log flushes.

So to summarize,

If you see WRITELOG wait type, before adding more disk, check how frequently you transactions commits and how good is your disk response time. Now, decide whether you need to optimize current disk holding transaction log or using Flash arrays or any other ultra fast disk subsystem.

Feel free to post your comments if you have any questions or if you need more information.

Further Reads:

https://blogs.msdn.com/b/psssql/archive/2011/01/07/discussion-about-sql-server-i-o.aspx

https://msdn.microsoft.com/en-us/library/cc917726.aspx

https://blogs.msdn.com/cfs-filesystemfile.ashx/__key/CommunityServer-Components-PostAttachments/00-09-98-45-35/Microsoft-SQL-Server-IO-Internals.pptx

Sakthivel Chidambaram, SQL Server Support