Table Partitioning Sliding Window Case

As I mentioned in my last post I'm currently working on a complex partitioning project for a customer. They need partitioning in order to speed up inserts on the table. Lots of inserts take place on this table. Right now they have around 2 million inserts per day and this will grow quickly to 4 million inserts/day. The table is heavily queried too, so there are a lot of indexes on the table. I know this isn't the most desirable design but that's how it is at the moment and at Premier Field Engineering our projects are usually not long enough to make judgments on architecture and design. Initial testing with partitioning looked promising so now we will implement it.

 

Let me get started. We start with the partition function.

First we set boundaries.

· They want partitions for each day of the current week. Weeks start on Monday.

· They want partitions for the last four weeks.

· They want partitions for the last 6 months.

· For the rest of the data they want one big partition.

 

There desired boundaries have some problems:

· Last four weeks do usually not end on the first of the month. Do we add these rows to the last month partition or do we make a partition for this month. I choose the latter because sooner or later this month would become a partition anyway.

· The archive partition is going to be huge! Maybe its better to move it to an archive-table. I have to discuss this with the customer, for now it's ok.

 

The table is partitioned at the moment with a RANGE RIGHT function. You can do 2 things when you have a table on a partition scheme you don't want:

· You can drop the clustered index and recreate it with the MOVE TO partitionscheme(partitionColumn) command.

· You can drop the table and recreate it on the partition scheme you want.

I choose number 2 because with SSIS you can quickly move data around and recreating a clustered index on a big table takes forever, especially with the MOVE TO directive.

 

I start with the initial situation in the partition function:

I'll load up the SQL2008 management studio and generate the scripts with the wizard. The wizard is also available when you connect to a SQL 2005 instance.

 

Here's the script I got:

 

BEGIN TRANSACTION

CREATE PARTITION FUNCTION [pfEvent](datetime)

AS RANGE LEFT

FOR VALUES (N'2008-07-06T00:00:00'

, N'2008-07-05T00:00:00'

, N'2008-07-04T00:00:00'

, N'2008-07-03T00:00:00'

, N'2008-07-02T00:00:00'

, N'2008-07-01T00:00:00'

, N'2008-06-30T00:00:00'

, N'2008-06-23T00:00:00'

, N'2008-06-16T00:00:00'

, N'2008-06-09T00:00:00'

, N'2008-06-02T00:00:00'

, N'2008-06-01T00:00:00'

, N'2008-05-31T00:00:00'

, N'2008-04-30T00:00:00'

, N'2008-03-31T00:00:00'

, N'2008-02-29T00:00:00'

, N'2008-01-31T00:00:00'

, N'2007-12-31T00:00:00');

 

CREATE PARTITION SCHEME [psEvent]

AS PARTITION [pfEvent]

TO ([PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [PRIMARY]

, [ARCHIVE]

, [ARCHIVE]);

 

 

CREATE CLUSTERED INDEX [ClusteredIndex_on_psEvent_633504211030911271] ON [dbo].[Event]

(

[EvtDatTim]

) ON [psEvent]([EvtDatTim]);

 

 

DROP INDEX [ClusteredIndex_on_psEvent_633504211030911271] ON [dbo].[Event] WITH ( ONLINE = OFF );

 

 

COMMIT TRANSACTION

 

I'll execute the script and the empty recreated table will be partitioned.

 

Now I can recreate all indexes again and load the data back with SSIS. After that foreign keys can be restored.

Now the table is partitioned I have to implement a mechanism to automate the sliding window. I'm going to implement this with stored procedures that are executed in jobs.

 

A couple of stored procedures are needed:

1. A stored procedure, run every Sunday, that creates the daily partitions of next week.

2. A stored procedure that merges the daily partitions of last week into 1 week partition. Also run every Sunday.

3. A stored procedure that merges the fifth week partition into the corresponding month partitions. Also run every Sunday.

4. A stored procedure that merges the seventh month partition into the archive partition. For simplicity I run this one every Sunday as well. Otherwise I need a second job.

 

Off course I could do all this in one stored procedure but I like small procedures. That debugs a lot easier!

1 & 2 are easy, 3 & 4 are a bit more complicated. Let's get going.

 

1. You can't simply create partitions with GetDate(). You would get the current time down to milliseconds into the boundary range. That's why I created a function to prepare the date:

CREATE FUNCTION MSPartitionPrepareDate

(

@Date SmallDateTime

)

RETURNS SmallDateTime

AS

BEGIN

DECLARE @Result SmallDateTime;

SELECT @Result = CAST(CONVERT(Varchar(23),@Date,101 ) AS SmallDateTime);

 

RETURN @Result;

 

END

 

The procedure that creates partitions for the days of the next week:

CREATE PROCEDURE [dbo].[MSPartitioningAddDays]

AS

BEGIN

DECLARE @SplitDate AS DateTime;

 

ALTER PARTITION SCHEME psEvent

NEXT USED [PRIMARY];

 

SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 1);

ALTER PARTITION FUNCTION pfEvent()

SPLIT RANGE (@SplitDate);

 

ALTER PARTITION SCHEME psEvent

NEXT USED [PRIMARY];

 

SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 2);

ALTER PARTITION FUNCTION pfEvent()

SPLIT RANGE (@SplitDate);

 

ALTER PARTITION SCHEME psEvent

NEXT USED [PRIMARY];

 

SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 3);

ALTER PARTITION FUNCTION pfEvent()

SPLIT RANGE (@SplitDate);

 

ALTER PARTITION SCHEME psEvent

NEXT USED [PRIMARY];

 

SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 4);

ALTER PARTITION FUNCTION pfEvent()

SPLIT RANGE (@SplitDate);

 

ALTER PARTITION SCHEME psEvent

NEXT USED [PRIMARY];

 

SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 5);

ALTER PARTITION FUNCTION pfEvent()

SPLIT RANGE (@SplitDate);

 

ALTER PARTITION SCHEME psEvent

NEXT USED [PRIMARY];

 

SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 6);

ALTER PARTITION FUNCTION pfEvent()

SPLIT RANGE (@SplitDate);

 

ALTER PARTITION SCHEME psEvent

NEXT USED [PRIMARY];

 

SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 7);

ALTER PARTITION FUNCTION pfEvent()

SPLIT RANGE (@SplitDate);

END

 

That's not too hard. Very simple and very readable. First you add room on the partition scheme and then you create the new day partition on the function. Do that 7 times and you're done.

 

2. Now we have to merge the data of the partitions of last week into one week partition. You can do that in 2 ways: Either you merge all days except Sunday, or you switch out all data to a staging table and switch it back in after you split out the days. Since a day is a relatively small partition I merge them directly.

 

CREATE PROCEDURE [dbo].[MSPartitioningMergeDaysInWeek]

AS

BEGIN

 

-- Merge the last 7 days partitions in one week partition.

-- Last Monday

EXEC [dbo].[MSPartitioningMergeDay] 1

-- Last Tuesday

EXEC [dbo].[MSPartitioningMergeDay] 2

-- Last Wednesday

EXEC [dbo].[MSPartitioningMergeDay] 3

-- Last Thursday

EXEC [dbo].[MSPartitioningMergeDay] 4

-- Last Friday

EXEC [dbo].[MSPartitioningMergeDay] 5

-- Last Saturday

EXEC [dbo].[MSPartitioningMergeDay] 6

END;

 

CREATE PROCEDURE [dbo].[MSPartitioningMergeDay]

@MinusDays int

AS

BEGIN

DECLARE @MergeDate AS DateTime;

 

SELECT @MergeDate = dbo.MSPartitionPrepareDate (GetDate() - @MinusDays);

 

ALTER PARTITION FUNCTION pfEvent()

MERGE RANGE (@MergeDate) ;

END;

 

2 simple stored procedures that get this job done. I work from 1 to 6 in order to merge the smallest set. If I work from 6 to 1 the set to merge would be larger after every merge.

 

3. The simple stuff is done now. After you run stored procedure 2 you have 5 week partitions. The oldest needs to be merged with the already existing month partition. If no partition exists yet one has to be created.

Simple design:

· First find the oldest week partition. GetDate() - 35 will get you there.

· Switch out this partition

· Look if a new month starts between GetDate() - 35 and GetDate() - 28

· If yes create new month partition that ends on GetDate() - 28

· If no alter the existing month partition so that it ends on GetDate() - 28

· Switch partition back in.

 

Here's the code:

CREATE PROCEDURE [dbo].[MSPartitioningMergeOldestWeekPartition]

AS

BEGIN

DECLARE @PartitionNumber AS int;

DECLARE @Boundary AS DateTime;

DECLARE @DateDiff AS int;

DECLARE @SplitDate AS DateTime;

DECLARE @ErrorNumber AS int;

 

-- Partition number and boundary value

SELECT @PartitionNumber = boundary_id

,@Boundary = CAST(value AS DateTime)

FROM sys.partition_range_values rv

INNER JOIN sys.partition_functions pf

ON rv.function_Id = pf.function_id

WHERE pf.name = 'pfEvent'

AND CAST(value AS DateTime) = (SELECT CAST(CAST(DATEPART("YEAR",getdate()-35) AS Varchar(4))

+ '-' + Right('0' + Convert(VarChar(2), Month(GetDate()-35)), 2)

+ '-' + Right('0' + Convert(VarChar(2), Day(GetDate()-35)), 2)

+ ' 00:00:00.000' AS DateTime))

-- Switching out.....

ALTER TABLE [dbo].[Event]

SWITCH PARTITION @PartitionNumber

TO Staging_Event_Proto

Print 'Geswitched'

--Getting rid of the now empty partition....

SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() - 35);

ALTER PARTITION FUNCTION pfEvent()

MERGE RANGE (@SplitDate)

SELECT @DateDiff = DATEDIFF("mm",getdate()-35,GETDATE()-28)

IF @DateDiff <> 0

BEGIN

-- Find last day of the month and make partitionboundary of it

ALTER PARTITION SCHEME psEvent

NEXT USED [PRIMARY];

ALTER PARTITION FUNCTION pfEvent()

SPLIT RANGE (DATEADD("dd", -1, CAST(CAST(DATEPART("YYYY" ,Getdate()-30) AS Varchar(4)) + '-' + CAST(DATEPART("mm",Getdate()-28) AS Varchar(2)) + '-01' AS DATETIME)));

-- Starting day of next partition is endday of this partition.

BEGIN TRY

ALTER PARTITION SCHEME psEvent

NEXT USED [PRIMARY];

SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() - 28);

ALTER PARTITION FUNCTION pfEvent()

SPLIT RANGE (@SPlitDate);

END TRY

BEGIN CATCH

-- Partition might be there already

SELECT @ErrorNumber = ERROR_NUMBER();

IF @ErrorNumber <> 7721 -- Errornumber of overlapping partitions

BEGIN

SELECT

ERROR_NUMBER() AS ErrorNumber,

ERROR_SEVERITY() AS ErrorSeverity,

ERROR_STATE() AS ErrorState,

ERROR_PROCEDURE() AS ErrorProcedure,

ERROR_LINE() AS ErrorLine,

ERROR_MESSAGE() AS ErrorMessage;

END

END CATCH

END

-- Switching in....

-- Can't use the switch command since the parititions don't match.

BEGIN TRANSACTION

INSERT INTO [dbo].[Event]

SELECT * FROM dbo.Staging_Event_Proto;

TRUNCATE TABLE dbo.Staging_Event_Proto;

COMMIT TRANSACTION

 

END;

 

That's quite a piece of code. I have tested it now and it works. If it doesn't on your machine give me a shout.

 

4. For the last procedure I take the easy way and just merge the partition. The partition is not queried a lot and it can run in the background without bothering anybody.

Here we go:

 

CREATE PROCEDURE [dbo].[MSPartitioningMergeOldestMonth]

AS

BEGIN

DECLARE @Boundary DateTime;

-- Boundary value: it's always partion 1

SELECT @Boundary = CAST(value AS DateTime)

FROM sys.partition_range_values rv

INNER JOIN sys.partition_functions pf

ON rv.function_Id = pf.function_id

WHERE pf.name = 'pfEvent'

AND rv.boundary_id = 1

--Merge this partition....

ALTER PARTITION FUNCTION pfEvent()

MERGE RANGE (@Boundary)

END;

 

So that's it. Now we only have to make a job with a weekly schedule to run it on Sunday and we're done. Tomorrow I'm going to test all this code. This case took me almost 7 hours and a lot of energy :). But it was great fun. I usually don't code all day long anymore but every once and while it's great fun!

Expect a post on testing soon.

I know I'm not the best T-SQL Coder there is: suggestions to optimize the code are therefore welcome.