Data Compression and Heaps

I think we can all agree that Data Compression is a great new functionality in SQL Server 2008.  I know many of you have been experimenting with it but there was one little catch I wanted you to be aware of.

When you apply page compression to a table the pages get compressed when they are full but there is an exception as you would have expected.  When you have a heap the newly allocated pages are NOT compressed until you rebuild the table (or remove and reapply compression or add and remove a clustered index) unless you are using BULK INSERT. 

I have added a test script so you can see it with your own eyes.

First the clustered table:

SET NOCOUNT ON
GO

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[myClusteredTable]') AND type in (N'U'))
DROP TABLE [dbo].[myClusteredTable]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[myClusteredTable](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [TheDate] [date] NOT NULL,
 [TheTime] [time](7) NOT NULL,
 [SomeFiller] [varchar](256) NOT NULL,
 CONSTRAINT [PK_myClusteredTable] PRIMARY KEY CLUSTERED
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

--Add some records to the table with varying length
INSERT INTO dbo.myClusteredTable
 VALUES (SYSDATETIME(), SYSDATETIME(), REPLICATE('W', CEILING(RAND() * 100) + 1))
 
GO 15000

--Check the space used
sp_spaceused myClusteredTable

GO

--Now apply compression
ALTER TABLE [dbo].[myClusteredTable] REBUILD PARTITION = ALL
WITH
(DATA_COMPRESSION = PAGE
)
GO

--Check the space used
sp_spaceused myClusteredTable
GO

--Add some more records to the table
INSERT INTO dbo.myClusteredTable (TheDate, TheTime, SomeFiller)
 SELECT TheDate, TheTime, SomeFiller FROM myClusteredTable
GO

--Check the space used again
sp_spaceused myClusteredTable
GO

DROP TABLE myClusteredTable

As you can see the compression keeps going even when you insert new rows.
But what about the heap?

SET NOCOUNT ON
GO

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[myHeap]') AND type in (N'U'))
DROP TABLE [dbo].[myHeap]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[myHeap](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [TheDate] [date] NOT NULL,
 [TheTime] [time](7) NOT NULL,
 [SomeFiller] [varchar](256) NOT NULL,
 CONSTRAINT [PK_myHeap] PRIMARY KEY NONCLUSTERED
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

--Add some records to the table with varying length
INSERT INTO dbo.myHeap
 VALUES (SYSDATETIME(), SYSDATETIME(), REPLICATE('W', CEILING(RAND() * 100) + 1))
 
GO 15000

--Check the space used
sp_spaceused myHeap

GO

--Now apply compression
ALTER TABLE [dbo].[myHeap] REBUILD PARTITION = ALL
WITH
(DATA_COMPRESSION = PAGE
)
GO

--Check the space used
sp_spaceused myHeap
GO

--Add some more records to the table
INSERT INTO dbo.myHeap (TheDate, TheTime, SomeFiller)
 SELECT TheDate, TheTime, SomeFiller FROM myHeap
GO

--Check the space used again
sp_spaceused myHeap
GO

--Rebuild the heap?
ALTER TABLE myHeap REBUILD
GO

--Check the space used again
sp_spaceused myHeap
GO

--What about a bulk insert?
BULK INSERT TestDB.dbo.MyHeap
   FROM 'c:\temp\test.txt'
   WITH
      (
   FORMATFILE='c:\temp\myHeap.xml',
   KEEPIDENTITY,
   TABLOCK
      )
GO

--Check the space used again
sp_spaceused myHeap
GO

--Rebuild the heap?
ALTER TABLE myHeap REBUILD
GO

--Check the space used again
sp_spaceused myHeap
GO

--Seems compression was indeed applied with BULK INSERT, go ahead and drop the test table
DROP TABLE myHeap

This was tested on SQL Server 2008 SP1 + CU2 (build 10.0.2714).