Recommendations for Index Maintenance with AlwaysOn Availability Groups

 

SYMPTOMS

Consider the following scenario

  • The database is part of AlwaysOn Availability Groups
  • You run long and log-intensive transactions like Index maintenance/rebuilds

You observe one or more of the following symptoms:

  • Poor performing DML operations in availability databases on the primary replica if synchronous secondary replicas are present.
  • Huge log send queue in the primary.
  • Considerable log growth in those availability databases in which the index rebuild occurs
  • Redo backlog in secondary replicas.

 

CAUSE

Large maintenance operations like ALTER INDEX or CREATE INDEX can generate huge amounts of logged changes by the nature of the operation. These transactions can utilize parallelism to use multiple threads generating logged changes to accomplish the large transaction. This is in addition to the log generated by the regular day to day operations of your application.

For synchronous commit environments, these large transactions can create contention with other production changes, reducing the overall performance of the primary replica. Synchronous-commit mode emphasizes high availability over performance, at the cost of increased transaction latency. Under synchronous commit mode, transactions wait to send the transaction confirmation to the client until the secondary replica has hardened the log to disk.  All this logged activity is being captured by a per-database single threaded log capture thread. In addition encryption and compression routines are also single threaded. When these single-threaded routines are processing very large amounts of logged activity, your application transactions may suffer performance degradation.

Log-intensive transactions like rebuilding indexes can cause the log file to grow significantly as the log cannot be truncated until redo has completed the changes in all secondary replicas.

 

MITIGATION

Transactions like ALTER INDEX or CREATE INDEX are log intensive by nature. We cannot eliminate the log generation but we can do intelligent maintenance to reduce the impact of the index rebuild on production activities.  Here are some steps to minimize the impact:

Mitigation Steps From Index Maintenance Perspective

The following strategies may reduce the contention index rebuilding has on your production environment:

  • Run index maintenance during off peak period if there is any.
  • Frequency of the maintenance should be optimized with the goal of minimizing impact of log generation. You can do maintenance in phases, like each phase doing a subset of indexes at a particular time, instead of considering all indexes at a single go.
  • Rebuild indexes based on true need / impact to your production environment. Article Script to appropriate rebuild/reorganize database indexes - SQL Server 2005 shows how to achieve following steps to optimize the index maintenance process -
    • Ignore heaps and small tables
    • Check the fragmentation level of each index
    • If the fragmentation level is low (say less than 10%) - do nothing
    • If the fragmentation level is medium (say in between 10 to 30%) - reorganize the index
    • If the fragmentation level is high (say more than 30%) - rebuild the index
  • Use MAXDOP setting in the ALTER INDEX command to reduce the concurrent index alteration activity. Operations such as creating, rebuilding, or dropping indexes can be resource intensive and can cause insufficient resources for other applications and database operations for the duration of the index operation. When this problem occurs, you can manually configure the maximum number of processors that are used to run the index statement by limiting the number of processors to use for the index operation. Lower MADOP setting can also reduce fragmentation with online Index rebuild operation. Below is an example of ALTER INDEX  REBUILD with MAXDOP =1:
     USE [AdventureWorks2014]
    GO
  ALTER INDEX [PK_Employee_BusinessEntityID] 
 ON [HumanResources].[Employee] 
 REBUILD WITH (MAXDOP=1, ONLINE= ON);
    GO

 

  • Consider using table partitioning. This way you can rebuild portions of the index piece by piece by using ALTER INDEX…REBUILD PARTITION. SQL Server 2014 supports the ONLINE rebuild of partitioned indexes. Below is a sample script to rebuild only on partition 1:
     USE [AdventureWorks2014]
    GO
  ALTER INDEX [IX_TransactionHistory_ProductID] 
  ON [Production].[TransactionHistory] 
   REBUILD PARTITION = 1 
  WITH (ONLINE = ON);
 GO
  • For SQL server 2014 additional ALTER INDEX parameters like WAIT_AT_LOW_PRIORITY, MAX_DURATION, and ABORT_AFTER_WAIT can also be used. Following example rebuilds index with the ONLINE option including the low priority lock options:
     USE [AdventureWorks2014]
    GO
  ALTER INDEX ALL ON Production.Product
   REBUILD WITH
    (ONLINE = ON ( WAIT_AT_LOW_PRIORITY ( MAX_DURATION = 4  MINUTES, ABORT_AFTER_WAIT = BLOCKERS )));
   GO

Mitigation Steps From Availability Group Perspective

For synchronous commit environments, before issuing long and log-intensive transactions like ALTER INDEX or CREATE INDEX, you may additionally consider to make all synchronous replicas asynchronous to help reduce the transactional latency. Once the index rebuild transactions are completed, the commit mode should be switched back to synchronous.

For both SYNCHRONOUS AND Asynchronous environments, in general any step that can help with redo performance will positively impact long and log-intensive transactions on AG environments. Here are some key points to keep in mind:

  • A busy secondary such as, secondary with resource bottleneck or a large reporting workload on the secondary replica, can slow down the performance of the secondary replica because of resource contention, and the redo thread can fall behind. The redo thread on the secondary replica can also be blocked from making data definition language (DDL) changes by a long-running read-only query. The diagnosis and mitigations steps for these issues are discussed in the article Troubleshoot: Availability Group Exceeded RTO.
  • You need to periodically check AG databases to make sure they do not have too many Virtual Log Files (VLFs) which can severely impact redo process of the secondary. Diagnosis and corrective steps are discussed in the article Too Many Virtual Log Files (VLFs) Can Cause Slow Database Recovery.
  • We recommend that you always switch to the High Performance power plan for all replica machines for all operating systems. In some circumstances this can have better performance for the single threaded redo process.

  

REFERENCES