Table Partitioning Sliding Window Case


As I mentioned in my last post I’m currently working on a complex partitioning project for a customer. They need partitioning in order to speed up inserts on the table. Lots of inserts take place on this table. Right now they have around 2 million inserts per day  and this will grow quickly to 4 million inserts/day. The table is heavily queried too, so there are a lot of indexes on the table. I know this isn’t the most desirable design but that’s how it is at the moment and at Premier Field Engineering our projects are usually not long enough to make judgments on architecture and design. Initial testing with partitioning looked promising so now we will implement it.


 


Let me get started. We start with the partition function.


First we set boundaries.


·         They want partitions for each day of the current week. Weeks start on Monday.


·         They want partitions for the last four weeks.


·         They want partitions for the last 6 months.


·         For the rest of the data they want one big partition.


 


There desired boundaries have some problems:


·         Last four weeks do usually not end on the first of the month. Do we add these rows to the last month partition or do we make a partition for this month. I choose the latter because sooner or later this month would become a partition anyway.


·         The archive partition is going to be huge! Maybe its better to move it to an archive-table. I have to discuss this with the customer, for now it’s ok.


 


The table is partitioned at the moment with a RANGE RIGHT function. You can do 2 things when you have a table on a partition scheme you don’t want:


·         You can drop the clustered index and recreate it with the MOVE TO partitionscheme(partitionColumn) command.


·         You can drop the table and recreate it on the partition scheme you want.


I choose number 2 because with SSIS you can quickly move data around and recreating a clustered index on a big table takes forever, especially with the MOVE TO directive.


 


I start with the initial situation in the partition function:


I’ll load up the SQL2008 management studio and generate the scripts with the wizard. The wizard is also available when you connect to a SQL 2005 instance.


 


Here’s the script I got:


 


BEGIN TRANSACTION


CREATE PARTITION FUNCTION [pfEvent](datetime)


AS RANGE LEFT


FOR VALUES (N’2008-07-06T00:00:00′


, N’2008-07-05T00:00:00′


, N’2008-07-04T00:00:00′


, N’2008-07-03T00:00:00′


, N’2008-07-02T00:00:00′


, N’2008-07-01T00:00:00′


, N’2008-06-30T00:00:00′


, N’2008-06-23T00:00:00′


, N’2008-06-16T00:00:00′


, N’2008-06-09T00:00:00′


, N’2008-06-02T00:00:00′


, N’2008-06-01T00:00:00′


, N’2008-05-31T00:00:00′


, N’2008-04-30T00:00:00′


, N’2008-03-31T00:00:00′


, N’2008-02-29T00:00:00′


, N’2008-01-31T00:00:00′


, N’2007-12-31T00:00:00′);


 


CREATE PARTITION SCHEME [psEvent]


AS PARTITION [pfEvent]


TO ([PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [PRIMARY]


, [ARCHIVE]


, [ARCHIVE]);


 


 


CREATE CLUSTERED INDEX [ClusteredIndex_on_psEvent_633504211030911271] ON [dbo].[Event]


(


[EvtDatTim]


) ON [psEvent]([EvtDatTim]);


 


 


DROP INDEX [ClusteredIndex_on_psEvent_633504211030911271] ON [dbo].[Event] WITH ( ONLINE = OFF );


 


 


COMMIT TRANSACTION


 


I’ll execute the script and the empty recreated table will be partitioned.


 


Now I can recreate all indexes again and load the data back with SSIS. After that foreign keys can be restored.


Now the table is partitioned I have to implement a mechanism to automate the sliding window. I’m going to implement this with stored procedures that are executed in jobs.


 


A couple of stored procedures are needed:


1.       A stored procedure, run every Sunday, that creates the daily partitions of next week.


2.       A stored procedure that merges the daily partitions of last week into 1 week partition. Also run every Sunday.


3.       A stored procedure that merges the fifth week partition into the corresponding month partitions. Also run every Sunday.


4.       A stored procedure that merges the seventh month partition into the archive partition. For simplicity I run this one every Sunday as well. Otherwise I need a second job.


 


Off course I could do all this in one stored procedure but I like small procedures. That debugs a lot easier!


1 & 2 are easy, 3 & 4 are a bit more complicated. Let’s get going.


 


1.       You can’t simply create partitions with GetDate(). You would get the current time down to milliseconds into the boundary range. That’s why I created a function to prepare the date:


CREATE FUNCTION MSPartitionPrepareDate


(


@Date SmallDateTime


)


RETURNS SmallDateTime


AS


BEGIN


DECLARE @Result SmallDateTime;


SELECT @Result = CAST(CONVERT(Varchar(23),@Date,101 ) AS SmallDateTime);


 


RETURN @Result;


 


END


 


The procedure that creates partitions for the days of the next week:


CREATE PROCEDURE [dbo].[MSPartitioningAddDays]


AS


BEGIN


DECLARE @SplitDate AS DateTime;


 


ALTER PARTITION SCHEME psEvent


NEXT USED [PRIMARY];


 


SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 1);


ALTER PARTITION FUNCTION pfEvent()


SPLIT RANGE (@SplitDate);


 


ALTER PARTITION SCHEME psEvent


NEXT USED [PRIMARY];


 


SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 2);


ALTER PARTITION FUNCTION pfEvent()


SPLIT RANGE (@SplitDate);


 


ALTER PARTITION SCHEME psEvent


NEXT USED [PRIMARY];


 


SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 3);


ALTER PARTITION FUNCTION pfEvent()


SPLIT RANGE (@SplitDate);


 


ALTER PARTITION SCHEME psEvent


NEXT USED [PRIMARY];


 


SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 4);


ALTER PARTITION FUNCTION pfEvent()


SPLIT RANGE (@SplitDate);


 


ALTER PARTITION SCHEME psEvent


NEXT USED [PRIMARY];


 


SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 5);


ALTER PARTITION FUNCTION pfEvent()


SPLIT RANGE (@SplitDate);


 


ALTER PARTITION SCHEME psEvent


NEXT USED [PRIMARY];


 


SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 6);


ALTER PARTITION FUNCTION pfEvent()


SPLIT RANGE (@SplitDate);


 


ALTER PARTITION SCHEME psEvent


NEXT USED [PRIMARY];


 


SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() + 7);


ALTER PARTITION FUNCTION pfEvent()


SPLIT RANGE (@SplitDate);


END


 


That’s not too hard. Very simple and very readable. First you add room on the partition scheme and then you create the new day partition on the function. Do that 7 times and you’re done.


 


2.       Now we have to merge the data of the partitions of last week into one week partition. You can do that in 2 ways: Either you merge all days except Sunday, or you switch out all data to a staging table and switch it back in after you split out the days. Since a day is a relatively small partition I merge them directly.


 


CREATE PROCEDURE [dbo].[MSPartitioningMergeDaysInWeek]


AS


BEGIN


 


— Merge the last 7 days partitions in one week partition.


— Last Monday


EXEC [dbo].[MSPartitioningMergeDay] 1


— Last Tuesday


EXEC [dbo].[MSPartitioningMergeDay] 2


— Last Wednesday


EXEC [dbo].[MSPartitioningMergeDay] 3


— Last Thursday


EXEC [dbo].[MSPartitioningMergeDay] 4


— Last Friday


EXEC [dbo].[MSPartitioningMergeDay] 5


— Last Saturday


EXEC [dbo].[MSPartitioningMergeDay] 6


END;


 


CREATE PROCEDURE [dbo].[MSPartitioningMergeDay]


@MinusDays int


AS


BEGIN


DECLARE @MergeDate AS DateTime;


 


SELECT @MergeDate = dbo.MSPartitionPrepareDate (GetDate() @MinusDays);


 


ALTER PARTITION FUNCTION pfEvent()


MERGE RANGE (@MergeDate) ;


END;


 


2 simple stored procedures that get this job done. I work from 1 to 6 in order to merge the smallest set. If I work from 6 to 1 the set to merge would be larger after every merge.


 


3.       The simple stuff is done now. After you run stored procedure 2 you have 5 week partitions. The oldest needs to be merged with the already existing month partition. If no partition exists yet one has to be created.


Simple design:


·         First find the oldest week partition. GetDate() – 35 will get you there.


·         Switch out this partition


·         Look if a new month starts between GetDate() – 35 and GetDate() – 28


·         If yes create new month partition that ends on GetDate() – 28


·         If no alter the existing month partition so that it ends on GetDate() – 28


·         Switch partition back in.


 


Here’s the code:


CREATE PROCEDURE [dbo].[MSPartitioningMergeOldestWeekPartition]


AS


BEGIN


DECLARE @PartitionNumber AS int;


DECLARE @Boundary AS DateTime;


DECLARE @DateDiff AS int;


DECLARE @SplitDate AS DateTime;


DECLARE @ErrorNumber AS int;


 


— Partition number and boundary value


SELECT @PartitionNumber = boundary_id


,@Boundary = CAST(value AS DateTime)


FROM sys.partition_range_values rv


INNER JOIN sys.partition_functions pf


ON rv.function_Id = pf.function_id


WHERE pf.name = ‘pfEvent’


AND CAST(value AS DateTime) = (SELECT CAST(CAST(DATEPART(“YEAR”,getdate()-35) AS Varchar(4))


+ ‘-‘ + Right(‘0’ + Convert(VarChar(2), Month(GetDate()-35)), 2)


+ ‘-‘ + Right(‘0’ + Convert(VarChar(2), Day(GetDate()-35)), 2)


+ ‘ 00:00:00.000’ AS DateTime))


— Switching out…..


ALTER TABLE [dbo].[Event]


SWITCH PARTITION @PartitionNumber


TO Staging_Event_Proto


Print ‘Geswitched’


–Getting rid of the now empty partition….


SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() 35);


ALTER PARTITION FUNCTION pfEvent()


MERGE RANGE (@SplitDate)


SELECT @DateDiff = DATEDIFF(“mm”,getdate()-35,GETDATE()-28)


IF @DateDiff <> 0


BEGIN


— Find last day of the month and make partitionboundary of it


ALTER PARTITION SCHEME psEvent


NEXT USED [PRIMARY];


ALTER PARTITION FUNCTION pfEvent()


SPLIT RANGE (DATEADD(“dd”, 1, CAST(CAST(DATEPART(“YYYY” ,Getdate()-30) AS Varchar(4)) + ‘-‘ + CAST(DATEPART(“mm”,Getdate()-28) AS Varchar(2)) + ‘-01’ AS DATETIME)));


— Starting day of next partition is endday of this partition.


BEGIN TRY


ALTER PARTITION SCHEME psEvent


NEXT USED [PRIMARY];


SELECT @SplitDate = dbo.MSPartitionPrepareDate(GetDate() 28);


ALTER PARTITION FUNCTION pfEvent()


SPLIT RANGE (@SPlitDate);


END TRY


BEGIN CATCH


— Partition might be there already


SELECT @ErrorNumber = ERROR_NUMBER();


IF @ErrorNumber <> 7721 — Errornumber of overlapping partitions


BEGIN


SELECT


ERROR_NUMBER() AS ErrorNumber,


ERROR_SEVERITY() AS ErrorSeverity,


ERROR_STATE() AS ErrorState,


ERROR_PROCEDURE() AS ErrorProcedure,


ERROR_LINE() AS ErrorLine,


ERROR_MESSAGE() AS ErrorMessage;


END


END CATCH


END


— Switching in….


— Can’t use the switch command since the parititions don’t match.


BEGIN TRANSACTION


INSERT INTO [dbo].[Event]


SELECT * FROM dbo.Staging_Event_Proto;


TRUNCATE TABLE dbo.Staging_Event_Proto;


COMMIT TRANSACTION


 


END;


 


That’s quite a piece of code. I have tested it now and it works. If it doesn’t on your machine give me a shout.


 


4.       For the last procedure I take the easy way and just merge the partition. The partition is not queried a lot and it can run in the background without bothering anybody.


Here we go:


 


CREATE PROCEDURE [dbo].[MSPartitioningMergeOldestMonth]


AS


BEGIN


DECLARE @Boundary DateTime;


— Boundary value: it’s always partion 1


SELECT @Boundary = CAST(value AS DateTime)


FROM sys.partition_range_values rv


INNER JOIN sys.partition_functions pf


ON rv.function_Id = pf.function_id


WHERE pf.name = ‘pfEvent’


AND rv.boundary_id = 1


–Merge this partition….


ALTER PARTITION FUNCTION pfEvent()


MERGE RANGE (@Boundary)


END;


 


So that’s it. Now we only have to make a job with a weekly schedule to run it on Sunday and we’re done. Tomorrow I’m going to test all this code. This case took me almost 7 hours and a lot of energy :). But it was great fun. I usually don’t code all day long anymore but every once and while it’s great fun!


Expect a post on testing soon.


I know I’m not the best T-SQL Coder there is: suggestions to optimize the code are therefore welcome.

Comments (5)

  1. KBE says:

    So how did the testing work out?   :)

    (Have more or less the same conditions as you descibe here and wanted to know how your project ended.)

  2. MKG says:

    Just curious to know how much the Merge and Split operations take.  If these are done on loaded partitions then it is expected to be time consuming, so wanted get some input on this.

  3. Ram says:

    Hi

    Have you tested this script. If every thing success please share code to mail id

    ramkiransqldba@gmail.com

    I want to implement on my environment

    thank you

  4. Dhaval Shah says:

    Hi,

    Thanks a lot for sharing such a great solution.

    As I m not a developer I request to modify this one with as our requirement. Hence I can understand the purpose of this but can't modify store Procedure.

    CASE STUDY:

    We have one table name by Transaction with (Time-stamp column & many)

    We need to create 3 Partition on this table-(Dynamic Partition creation for today & merge to 6 month partition on next day)

    1) Today's – Most of all action from application work on it.

    2) 6 Months- It will use for only Report view.

    3) Rest of Date – Not at all use just for record.

    Kindly Guide me….

    Regards Dhaval Shah

    dhaval.shah@ibsind.com

    +91 9825611047