Managing data motion during your deployments (Part 2)

Article
07/02/2009

This post finishes off a 2 part post. Sorry about the delay getting this second part out. I went on a vacation back in May and I am just now getting caught up.

In Part 1 we reviewed how to use the pre and post deployment scripts to manage data motion required to deploy some schema changes. In Part 2 we will look into how to implement your data motion scripts to only run when necessary.

You typically will not want your data motion scripts to run on every deployment. There could be various reasons including: the data motion is a one-time upgrade; a script may only be appropriate for a specific target; the script may only be applicable for a specific version of a database schema, etc. Scripts in the pre and post deployment scripts of VSTS: DB are retained in the project after deployment and are checked into SCC code control. These scripts continue to be a part of the project unless the developer removes them from the code base. If you are maintaining multiple versions of an application you would want to keep these scripts to enable upgrade of earlier versions of the database schema. Also, data motion scripts, while usually not reusable across versions of a schema, can be useful as a template if similar or same objects change in later versions of the database schema.

To enable scripts to only run when needed requires the developer to choose a strategy of how to detect when scripts are needed. There are many out there, but there are a few that are common with others being a variant of these:

· Expected schema conditions;

· Database version history table;

· Database version stamp;

· and combinations of the previous.

At the heart of these strategies is the goal to make data motion scripts be idempotent or simply re--runnable. To keep these scripts in the project and a part of every database deployment they must be re-runnable. If you have scripts that are one-shot DDL/DML scripts I would still enable them to be re-runnable since you will likely run the script multiple times during testing of trial deployments and database upgrade tests. You may not choose to keep all scripts around for the life of the project in the latest source, but experience has taught me that a little up front work to enable scripts to be re-runnable will pay dividends before the end of the project.

Expected schema conditions is probably the most common strategy and most seasoned database developers include this as a standard part of their DDL scripts. It’s just good practice and improves the reusability of scripts and achieves re-runnability. This strategy is checking for an expected state of schema. If schema is in an expected state the script runs. If the state of the schema is not in the expected state the script does nothing. I have sometimes heard database developers call these scripts non-destructible or non-invasive. I would steer clear of that terminology when talking about databases (especially when speaking to customers :)). Using our example from Part 1 here is the pre-deployment data motion script that employs the Expected Schema Condition:

Note: You would need to make similar modifications in the post deployment script, but instead of looking for the existence of the column you check to see the xxxDATAMOTOxxxCustomersTable exists .

PRINT 'STARTING PREDEPLOYMENT SCRIPT...'

SET NOCOUNT ON

--Add IsKeyCustomer Column to Customer Table. Required version 2->3 upgrade.

IF NOT EXISTS(

SELECT 1 FROM [sys].[syscolumns]

WHERE [sys].[syscolumns].[id] = OBJECT_ID('[Customer].[Customers]')

AND [sys].[syscolumns].[name] = 'IsKeyCustomer')

BEGIN

PRINT 'STARTING ADDIsKeyCustomer - PREDATAMOTO_CUSTOMERS_TABLE'

…

Database version history tables are pretty common as well. They became very common when it showed up as a part of the Adventure Works sample database SQL Server shipped back in 2000 (dbo.AWBuildVersion). The Database Version History strategy is an approach you use to record that a script has run or the database schema is of a particular version. This approach typically includes a table that is persisted as a part of the maintained database schema and upgrade scripts that will query this table to determine if they need to run. When they do run they update the table with a new version or add a record that the script has been ran. These tables are usually very simple including only the bare essentials. Here is an example of a Database Version History Table.

CREATE TABLE [dbo].[DBBuildVersion](

[Id] [tinyint] IDENTITY(1,1) NOT NULL

CONSTRAINT PK_Id PRIMARY KEY,

[DBversion] [NVARCHAR](25) NOT NULL,

[VersionDate] [DATETIME] NOT NULL

CONSTRAINT DF_VersionDate DEFAULT GETDATE())

And here is our example using the database version history table:

PRINT 'STARTING PREDEPLOYMENT SCRIPT...'

SET NOCOUNT ON

--Add IsKeyCustomer Column to Customer Table. Required version 2->3 upgrade.

SELECT 1 FROM [dbo].[DBBuildVersion]

WHERE [VersionDate] =

(SELECT MAX([VersionDate]) FROM [dbo].[DBBuildVersion])

AND RTRIM(LTRIM([DBVersion])) = '2'

BEGIN

PRINT 'STARTING ADDIsKeyCustomer - PREDATAMOTO_CUSTOMERS_TABLE'

…

At the end of your deployment, in the last statement of your post deployment script, you add the new version to the DBBuildVersion table.

INSERT [dbo].[DBBuildVersion](DBversion)VALUES(3)

Database version stamping is not very common, but is gaining popularity. This strategy includes objects as a part of the database schema that indicates the version of the database schema. Like the database version history table, the version stamp indicates the current version of the database schema, but instead of the version being persisted as data, it is persisted as an object. The object can be a Stored Procedure that returns the version of the database from a literal, a View that provides version history from a SELECT statement of literals, CLR object that reads information from registry, Extended Properties of the database or database objects. There are many ways you can implement this approach. This approach typically includes an object that will provide the current version of the database, maintained as a part of the database schema, and upgrade scripts that will query this version to determine if they need to run. Unlike the database version history table they do not persist any data to the database once an upgrade script has ran. Here is an example of the database version stamp using an Extended Property.

Added CustomerDB.extendedproperties.sql to project which contains:

EXEC sp_addextendedproperty @name='DBBuildVersion',

@value ='3',

@level0type = NULL,

@level0name = NULL,

@level1type = NULL,

@level1name = NULL,

@level2type = NULL,

@level2name = NULL

See how I specified ‘3’ as the value. Deployment will actually update the extended property with what is defined in the project during deployment if you are not ignoring extended properties. It’s important to understand this since by the time the post deployment script executes the version will be what is defined in the project and not the original version value in the database. This means that in your post deployment script you use the state changed by the pre-deployment script to determine if your post deployment action needs to run. This is actually a good thing since the post deployment action is dependent on the pre deployment action. Alternatively, you can manage the extended property and its value in the pre/post deployment scripts and have the targets original version value around until the very end. I have another post that coming that uses an approach where the version value actually comes from the build process and not the project.

And here is our example using the database version extended property:

PRINT 'STARTING PREDEPLOYMENT SCRIPT...'

SET NOCOUNT ON

--Required for version 2->3 upgrade

IF EXISTS(

SELECT 1 FROM sys.extended_properties

WHERE class_desc = 'DATABASE'

AND name = 'DBBuildVersion'

AND (value = '2' OR value ='1'))

BEGIN

PRINT 'STARTING ADDIsKeyCustomer - PREDATAMOTO_CUSTOMERS_TABLE'

…

BTW, I know the table value function fn_listextendedproperty exists. I prefer to use the sys.extended_properties table. I am hoping the SQL Team adds T-SQL support for extended properties eventually. When I think about it the following just feels right.

CREATE PROPERTY 'DBBuildVersion' ON DATABASE WITH VALUE '?’

SELECT * FROM DATABASE.PROPERTIES

UPDATE DATABASE.PROPERTIES SET DBBuildVersion = ’3’

Back to topic…

I have seen many customer implementations of data motion and upgrade scripts. There are reasons why one strategy may be more appealing than another and each affords some benefits that the others do not. I believe the strategies are most effective in combinations. I often recommend customers consider the Expected Schema and Database Version Stamp strategies combination. These 2 approaches work very well for teams using source code control and the database version is managed at design time. This combination also does not rely on the existence of data which may or may not be there for a database target or might get stepped on by database users or other code running against the database. The last thing you want is the database history table to get tampered with and scripts run when you did not intend them to. However, a database version history table can be useful for many things, but it should not be what ultimately controls the execution of your data motion or upgrade scripts. Using a persisted data to control execution actually will complicate your data motion scripts as you have to account for the non-existence of data.

Now, let’s update our example to use the Database Version Stamp and Expected Schema Conditions. First we will create our Extended Property in version 2 of the database schema hosted on my sandbox database instance.

EXEC sp_addextendedproperty @name='DBBuildVersion',

@value ='2',

@level0type = NULL,

@level0name = NULL,

@level1type = NULL,

@level1name = NULL,

@level2type = NULL,

@level2name = NULL

I also add the Extended Property to the database project like so:

[CustomerDB.extendedproperties.sql]

EXEC sp_addextendedproperty @name='DBBuildVersion',

@value ='3',

@level0type = NULL,

@level0name = NULL,

@level1type = NULL,

@level1name = NULL,

@level2type = NULL,

@level2name = NULL

I then update the pre and post deployment scripts to include expected schema conditions and version checks. In the pre-deployment script I move the version 3 upgrade into a new file in a sub-directory and include the file in the pre-deployment. These look like this:

[Script.PreDeployment.sql]

PRINT 'STARTING PREDEPLOYMENT SCRIPT...'

:r .\DataMotionUpgrades\Version3Upgrade.sql

PRINT 'PREDEPLOYMENT SCRIPT COMPLETE'

[.\Scripts\Pre-Deployment\DataMotionUpgrades\Version3Upgrade.sql]

--Version 3 Pre-Data Motion requred to upgrade version 1-2

IF EXISTS(

SELECT 1 FROM sys.extended_properties

WHERE class_desc = 'DATABASE'

AND name = 'DBBuildVersion'

AND (value = '2' OR value ='1'))

BEGIN

PRINT 'STARTING ADDIsKeyCustomer - PREDATAMOTO_CUSTOMERS_TABLE'

SET NOCOUNT ON

IF NOT EXISTS(

SELECT 1 FROM [sys].[syscolumns]

WHERE [sys].[syscolumns].[id] = OBJECT_ID('[Customer].[Customers]')

AND [sys].[syscolumns].[name] = 'IsKeyCustomer')

AND EXISTS(

SELECT * FROM [sys].[tables]

WHERE object_id = OBJECT_ID('[Customer].[Customers]'))

BEGIN

IF (SELECT COUNT(*) FROM [Customer].[Customers]) > 0

BEGIN

BEGIN TRY

BEGIN TRAN PREDATAMOTO_CUSTOMERS_TABLE

PRINT 'BACKING UP [Customer].[Customers]'

SELECT [Id],[FName],[LName],[Email],[Created],[Updated],[Deleted]

INTO [dbo].[v3xDATAMOTOxxxCustomers]

FROM [Customer].[Customers]

PRINT 'DELETING [Customer].[Customers]'

ALTER TABLE [Accounting].[CustomerInvoices] NOCHECK CONSTRAINT [FK_CustomerInvoices_Customer];

DELETE [Customer].[Customers]

COMMIT TRAN PREDATAMOTO_CUSTOMERS_TABLE;

END TRY

BEGIN CATCH

ROLLBACK TRAN PREDATAMOTO_CUSTOMERS_TABLE;

PRINT 'PREDATAMOTO_CUSTOMERS_TABLE TRANSACTION ROLLED BACK'

PRINT CAST(ERROR_NUMBER() AS CHAR(5)) + ERROR_MESSAGE()

END CATCH

END

ELSE

PRINT 'NO RECORDS TO WORRY ABOUT!!'

END

ELSE

PRINT 'WARNING!!!!!: SCHEMA IS AN UNEXPECTED STATE FOR THIS UPGRADE SCRIPT AND DATABASE VERSION'

SET NOCOUNT OFF

END

[Script.PostDeployment.sql]

PRINT 'STARTING POSTDEPLOYMENT SCRIPT...'

:r .\DataMotionUpgrades\Version3Upgrade.sql

PRINT 'POSTDEPLOYMENT SCRIPT COMLETE'

[.\Scripts\Post-Deployment\DataMotionUpgrades\Version3Upgrade.sql]

--Version 3 Pre-Data Motion requred to upgrade version 1-2

SET NOCOUNT ON

IF EXISTS(

SELECT 1 FROM sys.extended_properties

WHERE class_desc = 'DATABASE'

AND name = 'DBBuildVersion'

AND value ='3')

BEGIN

IF EXISTS(

SELECT * FROM [sys].[tables]

WHERE object_id = OBJECT_ID('[dbo].[v3xDATAMOTOxxxCustomers]'))

BEGIN

BEGIN TRAN POSTDATAMOTO_CUSTOMERS_TABLE;

BEGIN TRY

PRINT 'RESTORING [Customer].[Customers]'

IF (SELECT COUNT(*) FROM [dbo].[v3xDATAMOTOxxxCustomers]) > 0

BEGIN

SET IDENTITY_INSERT [Customer].[Customers] ON

INSERT [Customer].[Customers]

([Id],[FName],[LName],[Email],[IsKeyCustomer],[Created],[Updated],[Deleted])

SELECT

[Id],

[FName],

[LName],

[Email],

CASE

WHEN EXISTS

(SELECT SUM(InvoiceTotal) FROM [Accounting].[CustomerInvoices]

WHERE [Accounting].[CustomerInvoices].[CustomerId] =

[dbo].[v3xDATAMOTOxxxCustomers].[Id]

HAVING SUM(InvoiceTotal) >= 500000)

THEN 1

ELSE 0

END,

[Created],

[Updated],

[Deleted]

FROM [dbo].[v3xDATAMOTOxxxCustomers]

ORDER BY [dbo].[v3xDATAMOTOxxxCustomers].[Id]

SET IDENTITY_INSERT [Customer].[Customers] OFF

END

ELSE

BEGIN

PRINT 'WARNING!!!!!: SCHEMA IS AN UNEXPECTED STATE FOR THIS UPGRADE SCRIPT'

END

PRINT 'DROPPING [dbo].[v3xDATAMOTOxxxCustomers]'

DROP TABLE [dbo].[v3xDATAMOTOxxxCustomers]

COMMIT TRAN POSTDATAMOTO_CUSTOMERS_TABLE;

END TRY

BEGIN CATCH

ROLLBACK TRAN POSTDATAMOTO_CUSTOMERS_TABLE;

PRINT 'POSTDATAMOTO_CUSTOMERS_TABLE TRANSACTION ROLLED BACK'

PRINT CAST(ERROR_NUMBER() AS CHAR(5)) + ERROR_MESSAGE()

END CATCH

END

SET NOCOUNT OFF

Here is what the project system looks like now with these new pre and post deployment files:

Data Moto P2

Ok, I am ready now to test my updated data motion against a previous version of the database. To do this I pull previous version of the database project from source code control or from build drops if you have them around. I do this for both version 2 and version 3 of the schema as described in a previous post. I then deploy my latest database project against the previous versions of the database. Here is the output from both deployments.

Deployment output of latest and greatest against version 2. Our data motion runs…just what we want.

------ Deploy started: Project: CustomerDB, Configuration: Debug Any CPU ------

CustomerDB.dbschema(0,0)Warning TSD01269: The column [Customer].[Customers].[IsKeyCustomer] on table [Customer].[Customers] must be added, but the column has no default value and does not allow NULL values. If the table contains data, the ALTER script will not work. To avoid this issue, you must add a default value to the column or mark it as allowing NULL values.

Deployment script generated to:

C:\Temp\Data Motion\CustomerDB\sql\debug\CustomerDB.sql

STARTING PREDEPLOYMENT SCRIPT...

STARTING ADDIsKeyCustomer - PREDATAMOTO_CUSTOMERS_TABLE

BACKING UP [Customer].[Customers]

DELETING [Customer].[Customers]

PREDEPLOYMENT SCRIPT COMPLETE

Dropping DBBuildVersion...

Starting rebuilding table Customer.Customers...

Caution: Changing any part of an object name could break scripts and stored procedures.

Creating DBBuildVersion...

STARTING POSTDEPLOYMENT SCRIPT...

RESTORING [Customer].[Customers]

DROPPING [dbo].[v3xDATAMOTOxxxCustomers]

POSTDEPLOYMENT SCRIPT COMLETE

========== Build: 1 succeeded or up-to-date, 0 failed, 0 skipped ==========

========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========

Deployment output against version 3. No Mods. Again, just what we want.

------ Deploy started: Project: CustomerDB, Configuration: Debug Any CPU ------

Deployment script generated to:

C:\Temp\Data Motion\CustomerDB\sql\debug\CustomerDB.sql

STARTING PREDEPLOYMENT SCRIPT...

PREDEPLOYMENT SCRIPT COMPLETE

STARTING POSTDEPLOYMENT SCRIPT...

POSTDEPLOYMENT SCRIPT COMLETE

========== Build: 1 succeeded or up-to-date, 0 failed, 0 skipped ==========

========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========

Finally, I also need to test a clean deployment so I deploy to a new target. Here is the output.

------ Deploy started: Project: CustomerDB, Configuration: Debug Any CPU ------

Deployment script generated to:

C:\Temp\Data Motion\CustomerDB\sql\debug\CustomerDB.sql

Creating CustomerDB...

STARTING PREDEPLOYMENT SCRIPT...

PREDEPLOYMENT SCRIPT COMPLETE

Creating Accounting...

Creating Accounting.CustomerInvoices...

Creating Customer...

Creating Customer.Customers...

STARTING POSTDEPLOYMENT SCRIPT...

POSTDEPLOYMENT SCRIPT COMLETE

========== Build: 1 succeeded or up-to-date, 0 failed, 0 skipped ==========

========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========

As you can see, managing data motion takes up front planning and effort, but provides many benefits. You can now deploy with more confidence and can keep your version upgrades scripts as a part of the latest source. Deploying to earlier versions of your database is now a breeze.

One thing I want to point out is that this is a very simple example. More complicated schema changes against more sophisticated schemas work with the same approaches, but will take additional planning and consideration. I have gotten a lot of feedback and questions on the previous posts. One question that I have gotten a few times is how to manage data motion when your schema changes involve tables with constraints between them and the modifications are to all tables within these dependencies. This is where advance planning pays off. For these types of deployments these approaches work just as well, but you will have more work to do upfront as the number of data motion and dependencies to manage during deployment increases.

In my next post I will show you how to setup your version stamp so that you can increment the version from a build process.

Managing data motion during your deployments (Part 2)

Additional resources