Index Rebuild vs Reorganize: The transaction log edition

Recently a friend and fellow PFE Dave Levy (Twitter/Blog) and I were discussing a question someone posted in a forum. The question was in regards to database mirroring latency when a reorganization occurs. Dave mentioned that he has seen in the past that switching from performing a reorganization to a rebuild resulted in better throughput with mirroring. So, I thought it would be a really interesting and really geeky (and, as it turns out really relevant) blog post to contrast the two operations in regards to the transaction log throughput. So, the goal of this post is to compare and contrast the log generations rates for a rebuild vs a reorg operation for a perfectly fragmented index.  You should always consider the fragmentation levels for the index along with the requirements for your application and workload. For more information regarding which operation you should be choosing for rebuilding your indexes, check out this blog post by Paul Randal:  https://www.sqlskills.com/blogs/paul/post/Where-do-the-Books-Online-index-fragmentation-thresholds-come-from.aspx

Before I get to the demo, let's compare the two methods.  Here is a snippet taken from this link:  https://technet.microsoft.com/en-us/library/ms189858.aspx

"You can remedy index fragmentation by reorganizing or rebuilding an index. For partitioned indexes built on a partition scheme, you can use either of these methods on a complete index or a single partition of an index. Rebuilding an index drops and re-creates the index. This removes fragmentation, reclaims disk space by compacting the pages based on the specified or existing fill factor setting, and reorders the index rows in contiguous pages. When ALL is specified, all indexes on the table are dropped and rebuilt in a single transaction. Reorganizing an index uses minimal system resources. It defragments the leaf level of clustered and nonclustered indexes on tables and views by physically reordering the leaf-level pages to match the logical, left to right, order of the leaf nodes. Reorganizing also compacts the index pages. Compaction is based on the existing fill factor value."

OK, now that we are all on the same page I'll start the demo.  First, I'll create a database to use for the example.  If it exists, drop it and then create it.

USE [master]
SET NOCOUNT ON
GO
IF DB_ID('RebuildReorg') IS NOT NULL
BEGIN
ALTER DATABASE RebuildReorg
SET SINGLE_USER WITH ROLLBACK IMMEDIATE  

DROP DATABASE RebuildReorg
END
GO

CREATE DATABASE RebuildReorg ON PRIMARY
( NAME = N'RebuildReorg_Data', FILENAME = N'C:\SQL\RebuildReorg_Data.mdf' ,
SIZE = 2048MB , MAXSIZE = UNLIMITED, FILEGROWTH = 50MB )
LOG ON
( NAME = N'RebuildReorg_Log', FILENAME = N'C:\SQL\RebuildReorg_log.LDF' ,
SIZE = 2048MB , MAXSIZE = 5GB, FILEGROWTH = 200MB)
GO

Next I want to make sure I am in the full recovery model.  If I happened to have the DB in the Simple recovery mode then the log would be truncated when a checkpoint occurs.  I want the log records that I generate to stay in the log so I can see how much log is generated for a rebuild vs a reorganization.

ALTER DATABASE RebuildReorg
SET RECOVERY FULL
GO

Next I will create a table named Sales and pump some data into it.  I've added a clustered index on a uniqueidentifier column on the table.  This should leave the index in a very fragmented state once the data has been entered.  This is useful because I want to simulate a "worst case" scenario so that the rebuild and reorganize routines have a lot of work to do.

USE RebuildReorg
GO

IF OBJECT_ID('dbo.Sales') IS NOT NULL
DROP TABLE dbo.Sales
GO

 CREATE TABLE dbo.Sales
(
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL,
[LineTotal] AS (isnull(([UnitPrice]*((1.0)-[UnitPriceDiscount]))*[OrderQty],(0.0))),
[ModifiedDate] [datetime] NOT NULL,
GUIDCol UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID())
) ON [PRIMARY]
GO

CREATE CLUSTERED INDEX cixSales ON dbo.Sales (GUIDCol)
GO

INSERT INTO dbo.Sales
(
SalesOrderID,
SalesOrderDetailID,
CarrierTrackingNumber,
OrderQty,
ProductID,
SpecialOfferID,
UnitPrice,
UnitPriceDiscount,
ModifiedDate
)
SELECT
SalesOrderID,
SalesOrderDetailID,
CarrierTrackingNumber,
OrderQty,
ProductID,
SpecialOfferID,
UnitPrice,
UnitPriceDiscount,
ModifiedDate
FROM AdventureWorks2012.Sales.SalesOrderDetail
GO 20

I need to take a backup to start the recovery chain.  I just created this database and while it is in the full recovery mode, the database will be treated as though it is in the simple recovery mode since I have not yet taken a full backup.  In other words, the log will be truncated when a checkpoint occurs right now.  As soon as I take a full backup, the log will only be truncated when a log backup occurs.  In the full backup and log backups I am taking below, I am using the "NUL" backup device.  I'm just using this because I don't really want to write the backup out to disk.  I don't plan on using the backup in later operations.  You really never want to use this in a production environment unless you are testing how long it takes to read the data from the data files for the backup or another special case which I won't go into here.

BACKUP DATABASE RebuildReorg
TO DISK = 'NUL'
GO

BACKUP LOG RebuildReorg
TO DISK = 'NUL'
GO

Now rebuild the clustered index on the table.

ALTER INDEX cixSales ON dbo.Sales
REBUILD

I can use the system function sys.fn_dblog to find out how many log records were generated by the rebuild statement.  Remember, I took a log backup before I performed the rebuild, so I was working with an "empty" transaction log when the rebuild occurred.

SELECT LogRecordsGenerated = COUNT(*)
FROM sys.fn_dblog(NULL, NULL)
WHERE AllocUnitName = 'dbo.Sales.cixSales'

 

So, it looks like rebuilding the index resulted in 35516 transaction log records.  Cool.  What kind of log generation will I be looking at if I would have performed a reorganize instead?  To find out, I'll need to recreate the above scenario again. 

USE RebuildReorg
GO

IF OBJECT_ID('dbo.Sales') IS NOT NULL
DROP TABLE dbo.Sales
GO

 CREATE TABLE dbo.Sales
(
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL,
[LineTotal] AS (isnull(([UnitPrice]*((1.0)-[UnitPriceDiscount]))*[OrderQty],(0.0))),
[ModifiedDate] [datetime] NOT NULL,
GUIDCol UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID())
) ON [PRIMARY]
GO

CREATE CLUSTERED INDEX cixSales ON dbo.Sales (GUIDCol)
GO

INSERT INTO dbo.Sales
(
SalesOrderID,
SalesOrderDetailID,
CarrierTrackingNumber,
OrderQty,
ProductID,
SpecialOfferID,
UnitPrice,
UnitPriceDiscount,
ModifiedDate
)
SELECT
SalesOrderID,
SalesOrderDetailID,
CarrierTrackingNumber,
OrderQty,
ProductID,
SpecialOfferID,
UnitPrice,
UnitPriceDiscount,
ModifiedDate
FROM AdventureWorks2012.Sales.SalesOrderDetail
GO 20

Make sure to backup the log again so that we can compare the two operations.

BACKUP LOG RebuildReorg
TO DISK = 'NUL'
GO

Now perform the reorganize.

ALTER INDEX cixSales ON dbo.Sales
REORGANIZE

And run the query against the transaction log to see how much log was generated by the reorganize operation.

SELECT LogRecordsGenerated = COUNT(*)
FROM sys.fn_dblog(NULL, NULL)
WHERE AllocUnitName = 'dbo.Sales.cixSales'

As you can see, from a log generation perspective the reorganize statement created a SIGNIFICANTLY (35516 vs 2.7M) higher number of transaction log records as compared to the rebuild.  This is by design.  For large, heavily fragmented indexes the reorganization operation is much less efficient than the index rebuild operation.  This is important to know because if you're using a technology that relies on reading from the transaction log, such as mirroring, transactional replication/CDC then performing a reorg is going to generate a lot more activity that needs to be sorted through.  Of course, a reorg does have its benefits, such as being a progressive operation.  A reorg can be stopped while it is occurring and you won't lose the work it has already completed, as compared to a rebuild which is an atomic operation.  Like everything in technology, there are benefits and drawbacks to both operations. 

Hope this helps!

Tim