Visualizing Index Fragmenation

We all probably realize that fragmentation can be a performance hindrance, especially for scanning operations.  But, what does fragmentation really mean?  What does it look like?  I've been doing a lot of training for customers lately in terms of fragmentation so I wanted to do a blog post and dig a bit further into what index fragmentation really looks from a SQL Server page perspective.  I'll create a fragmented index and show you how the logical page linkages look like vs. what they should be if they were not fragmented at all. 

To begin, I'll create a database named FragmentsDB.

use master
go
SET STATISTICS XML OFF
SET STATISTICS IO OFF
SET NOCOUNT ON
go
IF DB_ID('FragmentsDB') IS NOT NULL
BEGIN
     ALTER DATABASE FragmentsDB
     SET SINGLE_USER WITH ROLLBACK IMMEDIATE  

     DROP DATABASE FragmentsDB
END
GO
CREATE DATABASE FragmentsDB
GO
ALTER DATABASE FragmentsDB
SET RECOVERY SIMPLE
GO

Next, create a table named NumbersTable and load some data into it.  The manner in which I am loading data will ensure that once I'm done the non-clustered index that I've created on the table will be noticeably fragmented.  The way I am doing this is to load numbers into the table, skipping every 3rd number.  Then, I'll load back into the table some of the data that I skipped before.  In doing this, the data that I load in the second insert will need to be placed in the appropriate pages (remember, an index is ordered and we have to put the rows into the page where the sorted values fit logically), causing pages to split and logical fragmentation to occur. Note that I am not making the pages completely full, as is the default fillfactor. This will ensure that for the purposes of this test that the index isn't 100% fragmented.

USE FragmentsDB
GO
CREATE TABLE NumbersTable
(
     NumberValue BIGINT NOT NULL,
     BiggerNumber BIGINT NOT NULL,
     CharacterColumn CHAR(50)
)
GO

ALTER TABLE NumbersTable
ADD CONSTRAINT PK_NumbersTable PRIMARY KEY CLUSTERED (NumberValue)
GO  

CREATE INDEX idx_NumbersTable ON NumbersTable(BiggerNumber)
WITH(FILLFACTOR=75)
GO

INSERT INTO NumbersTable
(
NumberValue, BiggerNumber, CharacterColumn
)
SELECT
     NumberValue,
     NumberValue + 5000000,
     LEFT(REPLICATE((CAST(NumberValue as VARCHAR(50))),50),50)
FROM
(
     SELECT
NumberValue = row_number() over(order by newid() asc)
FROM master..spt_values a
CROSS APPLY master..spt_values b
WHERE a.type = 'P' AND a.number <= 500 AND a.number > 0 AND
b.type = 'P' AND b.number <= 500 AND b.number > 0
) a
WHERE NumberValue % 3 != 0
GO  

INSERT INTO NumbersTable
(
     NumberValue, BiggerNumber, CharacterColumn
)
SELECT TOP(30000)
     NumberValue,
     NumberValue + 5000000,
     LEFT(REPLICATE((CAST(NumberValue as VARCHAR(50))),50),50)
FROM
(
     SELECT
           NumberValue = row_number() over(order by newid() asc)
     FROM master..spt_values a
     CROSS APPLY master..spt_values b
     WHERE a.type = 'P' AND a.number <= 500 AND a.number > 0 AND
     b.type = 'P' AND b.number <= 500 AND b.number > 0
) a
WHERE NumberValue % 3 = 0
GO

To look at the amount of logical fragmentation in the leaf layer of the index, use the DMV sys.dm_db_index_physical_stats.  We can see that the fragmentation is right at 53.32%.

SELECT avg_fragmentation_in_percent, index_level, record_count, page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('NumbersTable'), 2, NULL, 'Detailed')

  

So, what does a logical fragmentation of 53.32% really mean?  In the simplest of terms, it means that just over half of the pages in the leaf level of the index are not physically located in the order in which the keys in the index indicate that they should be.  In an index that had 0 fragmentation, the leaf level pages would be listed right next to each other logically and physically.  It would look something similar to this:

 

Note: for simplicity purposes I am leaving out the fact that the rows on a given page aren't stored in physical order. An offset array is used at the page level to track the ordering of the rows on a given page.

However, when logical fragmentation comes into play, the tree can look like something totally different.  The physical order (the allocation order) doesn't match up to the logical order of the keys (the sequential numbers in this case).  So, unless SQL Server is running an allocation order scan then it is very difficult to perform a scan operation in an efficient manner. 

 

SQL Server 2012 exposes a new DMV, sys.dm_db_database_page_allocations, that we can use to see the page linkages for a particular entity.  (The output from this DMV is very similar to the output to the DBCC IND command.  It is just more easily consumable).

In this case, I will try to calculate the fragmentation in the index based upon checking the allocation order linkages in the DMV vs. logical page linkages that I calculate using the new LEAD windowing function in 2012.  This new function allows you to look at a value that came just before the current record in a resultset.

SELECT
     Fragmentation = ((COUNT(*)*1.00 - SUM(Case When NextPhysicalPageID = NextLogicalPageID then 1 else 0 end))/COUNT(*)*1.00)*100,
     PageCount = COUNT(*)
FROM

(
     SELECT allocated_page_page_id, NextLogicalPageID = next_page_page_id,
NextPhysicalPageID = LEAD(allocated_page_page_id) over(partition by object_id, index_id order by allocated_page_page_id)
FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID('NumbersTable'), 2, NULL, 'DETAILED')
WHERE page_type_desc = 'INDEX_PAGE' and page_level = 0
) a

Looking at the results from the query above, the fragmentation output that is the same as produced from the sys.dm_db_index_physical_stats DMV.  There are a few outlying cases I've seen so far where the output between these 2 didn't match up exactly (usually off < 1 %).  I need to dig into the source code to determine why. 

  

At this point, you may be thinking…"Big Deal.  I can already find the fragmentation of an index.  Why would I need a new way to do it?"  The answer is – you don't.  You should still use the regular method of identifying logical fragmentation.  The aggregated query above is a sanity check to make sure the two methods match up.  The real reason for this blog post is below.  In this query I show the detail around my fragmented index.  The logic is the same as the aggregated query above, but is this time showing the detail associated with the page linkages.

SELECT
     *, PageLinkMatch = Case When NextPhysicalPageID = NextLogicalPageID then 1 else 0 end
FROM
(
     SELECT allocated_page_page_id, NextLogicalPageID = next_page_page_id,
     NextPhysicalPageID = LEAD(allocated_page_page_id) over(partition by object_id, index_id order by allocated_page_page_id)
     FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID('NumbersTable'), 2, NULL, 'DETAILED')
     WHERE page_type_desc = 'INDEX_PAGE' and page_level = 0
) a

 

I am checking to see if the NextLogicalPage (as is returned by the DMV) matches up to the NextPhysicalPage that I am calculating.  If the pages match, then that specific linkage is not fragmented.  If they do not match, then the link IS fragmented.  Let's take a look at a specific example from my output (your mileage will vary with the page numbers, etc).  In the case of Page 997 the next physical page matches to the next logical page.  The page before 997 pointed logically to page 3019.  This was due to the page splits caused by the insert statements created above.  However, the next physical page (the order of the page allocation) is page 997.  This "logical fragmentation" is the cause of SQL Server having to break up IOs into multiple operations for scanning purposes. 

 

In fact, we can further investigate the page using DBCC PAGE.  By supplying the page number, we can see that the previous page and next page in the page header matches up to the output that we are seeing in the DMV.

DBCC TRACEON(3604, -1)
DBCC PAGE(FragmentsDB, 1, 997, 2)

 

I hope this helps in visualizing what logical fragmentation looks at from the SQL Server point of view. From a performance point of view, fragmentation isn't really that big of a deal for operations that are doing singleton style lookups – a few pages returned isn't going to be hindered by fragemented index. Scan operations will suffer because the engine will need to break up the IO for the scan into multiple IOs when fragmentation is present.