Good Page Splits and Sequential GUID Key Generation

 

It’s well-known that inserts using monotonically increasing key values at the end of an index minimize page splits, and that using a randomly generated key value causes lots of expensive page splits.  But a colleague and I were recently discussing what happens when you use monotonically increasing key values in the middle of an index.

The motivation for this is that when you are using NEWSEQUENTIALID for GUID (aka UNIQUEIDENTIFIER) generation, the GUIDs generated are monotonically increasing only on the particular server, and may reset after each reboot.  After a reboot, or if your workload moves to another server, via restore, log shipping, AlwaysOn failover, etc, then new GUIDs will be generated and inserted in the “middle” of the range of existing key values.  Also if you are generating key values on the application server you can have the same issue when you change application servers.  So would that create a large number of expensive page splits?  If so then it’s a _bad thing_ for GUID key generation in general, since you basically can’t ever guarantee that your database will run on the same server for ever without a reboot.

A short aside here: I like GUID keys, even as clustered index keys.  Many SQL Server professionals don’t, since there are several well-known performance problems that can be caused by GUID keys.  And I would never use a GUID key when I could use an int or bigint key, or a domain natural key (even a compound one, which I also like), but they have their place.  In particular GUID represent a good solution to an hard problem: key values can be generated on multiple computers and reliably merged, and the key values can be generated by application code on the client or middle tier.  This is powerful functionality that many application designs rely on.  And I really don’t like changing a design for performance reasons, especially since there are so many incorrect or poorly understood performance “best practices” out there.  Here are the common performance objections to GUID keys:

1) They are large (16 bytes), and enlarge all non-clustered indexes.

This is absolutely true.  Using a larger key is more expensive in disk space and IOs.  But if your performance requirements are not extreme, it’s a bearable expense, and not one that will appear only when you try to scale the users, throughput or database size.

2) GUID clustered index keys cause new rows to be uniformly distributed across the clustered index, causing expensive page splits, poor cache performance and about 30% unused space on every database page.

Again absolutely true of randomly generated GUID keys.  How true of sequentially-generated GUID keys, we shall see.  This is an annoyance for small databases, but because of the poor cache performance it’s a real scalability problem for large databases.  Most of the serious performance problems I’ve seen with GUID keys are with databases that started out small, but then grew to be large.  They then required huge amounts of disk IO or giant memory sizes to achieve a reasonable throughput.

3) If you use a GUID for a Primary Key, you should make it a non-clustered index to minimize the problems with 1) and 2).

I really don’t buy this one.  Turning the table into a Heap or clustering using an alternate key makes operations involving the Primary Key more expensive.  For instance a single-row lookup using the Primary Key will cost 1-4 extra logical IO’s if the PK is non-clustered.  Additionally the GUID PK index still suffers from 2).  So unless there’s a really good alternate clustering key or a large number of non-clustered indexes it’s not obvious that there will be a net improvement by doing this.

 So I inclined to think that you have to avoid random GUID key generation, instead using NEWSEQUENTIALID or UuidCreateSequential() on the middle tier, and that if you do using a GUID for a clustered primary key index will perform reasonably well. 

But that brings us back to the issue of sequential GUID inserts into the “middle” of an index, since that’s what you will get with after a restore, failover or migration of the server generating the sequential GUIDs.  Technically Page Splits occur whenever a page fills up and more space is needed for rows.  So even with an integral-type identity column a page split occurs every time the last page fills up with rows and a new page is allocated.  This end-of-index page split, however, does not cause any data movement.  The the old page (ie the page to be split) remains 100% full, and the new page starts off at 0% full and then gets a single row.  We’ll call this a “Good Page Split”.  This is contrasted with a “Bad Page Split” which we’ll call a page split that moves 50% of the rows from the old page to the new page and leaves empty space on both pages.  So with sequential GUID inserts into the middle of an index, will we get Good Page Splits or Bad Page Splits?

The answer: Good Page Splits. 

The first insert into the middle of the index causes a Bad Page Split, but as you keep inserting rows at the same insertion point you get Good Page Splits from then on.  Specifically, a page split caused by a clustered index insert for a key value greater last key value on the old page, but less than the first key value on the next logical page will be a Good Page Split.  All the rows are left on the old page, the new page is written in the next spot on the extent, the page pointers are updated on the old page and the next logical page and a single row is written to the new page.  The only difference between this page split and an end-of-index page split is that the new page is not the last in the page chain so the forward and back page pointers need to be updated on two pages instead of one. 

And the net result of this is that generating key values with NEWSEQUENTIALID performs fine, even when you move the database from server to server and the insertion point changes.

Here’s how to test this.  First we’ll create a table with a GUID key and size it to fit exactly fifty rows per page.  To do this I just used fixed-width types and used trial-and-error.  This table has a clustered primary key index on a GUID column.  The column has a default of NEWSEQUENTIALID(), so if we insert rows without specifying an ID value a server-generated sequential GUID will be used.  If we insert rows using an explicit value for ID or NEWID() key will be inserted at a random location in the sort order.  The idea is to perform both sequential and random GUID key inserts into this table and compare the cost of the operations and the effects on the number of pages.  Using average-sized rows and performing a large number of inserts will enable us not only to observe the effects of the page splits, but to get some estimate about the cost delta between random and sequential GUID key inserts.

 

use tempdb

set nocount on

 

if object_id('ps_test') isnotnull

    droptable ps_test

 

createtable ps_test

(

  id uniqueidentifier default newsequentialid(),

  constraint pk_ps_test primarykey clustered (id),

  datachar(134) notnulldefault'sequential'

)

go

 

Now let’s put some data in the table with random GUIDs and rebuild the PK index to ensure that all the pages are completely filled.

 

insertinto ps_test (id,data)

selecttop (100000) newid(),'random'

from sys.objects s1, sys.objects s2, sys.objects s3

 

alterindex pk_ps_test on ps_test rebuild with (fillfactor=100)

 

select index_level, page_count, record_count, record_count/(1.0*page_count) avg_records_per_page

from sys.dm_db_index_physical_stats(db_id(),object_id('ps_test'),null,null,'detailed')

where index_level = 0

 

This outputs:

 

 page_count           record_count avg_records_per_page
-------------------- -------------------- ---------------------------------------
2005 100000 49.875311720698254364

 

Which is just what we expect.  100,000 rows, with close to 100% effective fill factor.  Now we’ll insert 100,000 now rows with random GUID keys and examine the results.  We’ll count the number of leaf-level pages before and after the inserts to count the page splits and measure the CPU and transaction log used:

begintransaction

print'using random GUIDs'

declare @cpu bigint = @@CPU_BUSY * CAST(@@TIMETICKS ASfloat) / 1000

select @cpu

declare @pages int =

  (

   select page_count

   from sys.dm_db_index_physical_stats(db_id(),object_id('ps_test'),null,null,'detailed')

   where index_level = 0

   and index_id = 1

   )

declare @rowsint = 100000;

declare @i int = 0

while @i < @rows

begin

  insert into ps_test (id,data)

  values (NEWID(),'random')

  set @i = @i + 1

end

 

select

  sum(t.database_transaction_log_bytes_used) transaction_log_bytes ,

   sum(t.database_transaction_log_bytes_used) /@rows log_bytes_per_row,

   (@@CPU_BUSY * CAST(@@TIMETICKS ASfloat) / 1000) - @cpu cpu_time_used_ms

from sys.dm_tran_session_transactions st

join sys.dm_tran_database_transactions t

  on t.transaction_id = st.transaction_id

where st.session_id = @@spid;

 

select page_count, page_count - @pages page_splits,

       record_count, @rows/(nullif(page_count,0)-@pages) rows_per_page_split,

       record_count/(1.0*page_count) avg_records_per_page

from sys.dm_db_index_physical_stats(db_id(),object_id('ps_test'),null,null,'detailed')

where index_level = 0

 

commit

 

That outputs:

using random GUIDs
transaction_log_bytes log_bytes_per_row cpu_time_used_ms
--------------------- -------------------- ----------------------
42258624 422 1844.5

page_count page_splits record_count rows_per_page_split avg_records_per_page
-------------------- -------------------- -------------------- -------------------- ------------------------
5826 3822 200000 26 34.328870580157912804

Then rebuild the index again and insert 100,000 rows using sequential GUID keys.  They sequential GUIDs should have an insert point somewhere in the middle of the 200,000 random GUIDs we’ve already inserted into the table (which we’ll verify).

alterindex pk_ps_test on ps_test rebuild with (fillfactor=100)

select page_count, record_count, record_count/(1.0*page_count) avg_records_per_page

from sys.dm_db_index_physical_stats(db_id(),object_id('ps_test'),null,null,'detailed')

where index_level = 0

 

go

 

begintransaction

print'using sequential GUIDs'

declare @cpu bigint = @@CPU_BUSY * CAST(@@TIMETICKS ASfloat) / 1000

declare @pages int =

  (

   select page_count

   from sys.dm_db_index_physical_stats(db_id(),object_id('ps_test'),null,null,'detailed')

   where index_level = 0

   and index_id = 1

   )

declare @rowsint = 100000;

declare @i int = 0

while @i < @rows

begin

  insert into ps_test (data)

  values ('sequential')

  set @i = @i + 1

end

select

  sum(t.database_transaction_log_bytes_used) transaction_log_bytes ,

   sum(t.database_transaction_log_bytes_used) /@rows log_bytes_per_row,

   (@@CPU_BUSY * CAST(@@TIMETICKS ASfloat) / 1000) - @cpu cpu_time_used_ms

from sys.dm_tran_session_transactions st

join sys.dm_tran_database_transactions t

  on t.transaction_id = st.transaction_id

where st.session_id = @@spid;

 

select page_count, page_count - @pages page_splits,

       record_count, @rows/(nullif(page_count,0)-@pages) rows_per_page_split,

       record_count/(1.0*page_count) avg_records_per_page

from sys.dm_db_index_physical_stats(db_id(),object_id('ps_test'),null,null,'detailed')

where index_level = 0

 

commit

 

Which outputs:

using sequential GUIDs
transaction_log_bytes log_bytes_per_row cpu_time_used_ms
--------------------- -------------------- ----------------------
24751792 247 1625.75

page_count page_splits record_count rows_per_page_split avg_records_per_page
-------------------- -------------------- -------------------- -------------------- -----------------------
6006 2002 300000 49 49.950049950049950049

Finally verify that the sequential GUIDs were actually inserted somewhere in the middle of the sort order.

selecttop 4 *

from ps_test

orderby id desc

Which outputs:

id data
------------------------------------ -----------
6FDAD671-F0FD-4B4F-9EEE-FFFFF5CFB76F random
F16F6F58-D198-494B-A991-FFFFAD5ABC36 random
0CE89E63-8517-457C-ABDA-FFFF9F167DB0 random
FE5F9287-4363-490E-9CC1-FFFF6E249453 random    

So the sequential GUIDs generated 41% less log and 12% less CPU than the random GUIDs. And sequential GUID inserts in the middle of the index is only slightly more expensive than inserts purely at the end of the index. Also the effective fill factor stays near 100% for the sequential GUID inserts, where it converges to about 65% for the random GUID inserts. This indicates that the sequential GUID inserts simply don’t require row migration during page splits and we are getting almost entirely Good Page splits.

We can verify the page split behavior by dumping the page headers of the old and new pages after a page split. To do this we’ll insert rows using PAGLOCK and a transaction. When we see that our transaction owns two exclusive page locks, the transaction performed a page split.   

while (1=1)

begin

begintran

  insert into ps_test with (paglock) (data)

  values ('sequential');

 

  declare @pages int =

  (

  selectcount(*)

  from sys.dm_tran_locks

  where request_session_id = @@spid

  and resource_type = 'PAGE'

  )

  if @pages = 2

  begin

 

      select resource_description

         from sys.dm_tran_locks

         where request_session_id = @@spid

         and resource_type = 'PAGE';

         commit;

         break;

  end

 

commit

end

That will dump the page numbers for the old and new pages. Then we can paste those into DBCC PAGE to see the page headers:

 

DBCC TRACEON(3604)

print ('--------------------------OLD PAGE---------------------------------')

dbcc page(TempDb,1,66892,0)

 

print ('--------------------------NEW PAGE---------------------------------')

dbcc page(TempDb,1,66893,0)

Which outputs:

--------------------------OLD PAGE---------------------------------

PAGE: (1:66892)

BUFFER:

BUF @0x000000047FD87680

bpage = 0x000000043FFDC000 bhash = 0x0000000000000000 bpageno = (1:66892)
bdbid = 2 breferences = 0 bcputicks = 240
bsampleCount = 1 bUse1 = 22503 bstat = 0x10b
blog = 0x1c9acc bnext = 0x0000000000000000         

PAGE HEADER:

Page @0x000000043FFDC000

m_pageId = (1:66892) m_headerVersion = 1 m_type = 1
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 100709m_indexId (AllocUnitId.idInd) = 8704
Metadata: AllocUnitId = 2449958203889614848
Metadata: PartitionId = 7277817001927835648 Metadata: IndexId = 1
Metadata: ObjectId = 885578193 m_prevPage = (1:66891) m_nextPage = (1:66893)
pminlen = 154 m_slotCnt = 50 m_freeCnt = 146
m_freeData = 7946 m_reservedCnt = 0 m_lsn = (802:1816:158)
m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0
m_tornBits = 0 DB Frag ID = 1                     

Allocation Status

GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED
PFS (1:64704) = 0x40 ALLOCATED 0_PCT_FULL DIFF (1:6) = NOT CHANGED
ML (1:7) = NOT MIN_LOGGED          

DBCC execution completed. If DBCC printed error messages, contact your system administrator.
--------------------------NEW PAGE---------------------------------

PAGE: (1:66893)

BUFFER:

BUF @0x000000047FD87740

bpage = 0x000000043FFDE000 bhash = 0x0000000000000000 bpageno = (1:66893)
bdbid = 2 breferences = 0 bcputicks = 0
bsampleCount = 0 bUse1 = 22503 bstat = 0x10b
blog = 0x1c9acc bnext = 0x0000000000000000         

PAGE HEADER:

Page @0x000000043FFDE000

m_pageId = (1:66893) m_headerVersion = 1 m_type = 1
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0xc000
m_objId (AllocUnitId.idObj) = 100709m_indexId (AllocUnitId.idInd) = 8704
Metadata: AllocUnitId = 2449958203889614848
Metadata: PartitionId = 7277817001927835648 Metadata: IndexId = 1
Metadata: ObjectId = 885578193 m_prevPage = (1:66892) m_nextPage = (1:359)
pminlen = 154 m_slotCnt = 1 m_freeCnt = 7937
m_freeData = 253 m_reservedCnt = 0 m_lsn = (802:1816:162)
m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0
m_tornBits = 0 DB Frag ID = 1                     

Allocation Status

GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED
PFS (1:64704) = 0x40 ALLOCATED 0_PCT_FULL DIFF (1:6) = NOT CHANGED
ML (1:7) = NOT MIN_LOGGED          

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

The old page has 50 rows on it (you can dump the rows too if you want with DBCC PAGE), and the new page has only one row.  You can also see that the new page is physically contiguous with the old page and the new page’s nextPage pointer points back to a physically previous page.

That’s it.  Cool isn’t it?  My thanks to my colleagues Cristopher Benge from Microsoft Consulting Services and Scott Hulke, the Technical Director of the Microsoft Technology Center in Dallas for helping me work through this.