Managing data motion during your deployments (Part 2)


This post finishes off a 2 part post. Sorry about the delay getting this second part out. I went on a vacation back in May and I am just now getting caught up.



In Part 1 we reviewed how to use the pre and post deployment scripts to manage data motion required to deploy some schema changes. In Part 2 we will look into how to implement your data motion scripts to only run when necessary. 


 


You typically will not want your data motion scripts to run on every deployment.  There could be various reasons including: the data motion is a one-time upgrade; a script may only be appropriate for a specific target; the script may only be applicable for a specific version of a database schema, etc.  Scripts in the pre and post deployment scripts of VSTS: DB are retained in the project after deployment and are checked into SCC code control.  These scripts continue to be a part of the project unless the developer removes them from the code base.  If you are maintaining multiple versions of an application you would want to keep these scripts to enable upgrade of earlier versions of the database schema.   Also, data motion scripts, while usually not reusable across versions of a schema, can be useful as a template if similar or same objects change in later versions of the database schema.


 


To enable scripts to only run when needed requires the developer to choose a strategy of how to detect when scripts are needed.  There are many out there, but there are a few that are common with others being a variant of these:


·         Expected schema conditions;


·         Database version history table;


·         Database version stamp;


·         and combinations of the previous.


 


At the heart of these strategies is the goal to make data motion scripts be idempotent or simply re–runnable. To keep these scripts in the project and a part of every database deployment they must be re-runnable.  If you have scripts that are one-shot DDL/DML scripts I would still enable them to be re-runnable since you will likely run the script multiple times during testing of trial deployments and database upgrade tests.  You may not choose to keep all scripts around for the life of the project in the latest source, but experience has taught me that a little up front work to enable scripts to be re-runnable will pay dividends before the end of the project.


 


Expected schema conditions is probably the most common strategy and most seasoned database developers include this as a standard part of their DDL scripts.  It’s just good practice and improves the reusability of scripts and achieves re-runnability.  This strategy is checking for an expected state of schema. If schema is in an expected state the script runs.  If the state of the schema is not in the expected state the script does nothing. I have sometimes heard database developers call these scripts non-destructible or non-invasive. I would steer clear of that terminology when talking about databases (especially when speaking to customers :)).  Using our example from Part 1 here is the pre-deployment data motion script that employs the Expected Schema Condition:


 


Note: You would need to make similar modifications in the post deployment script, but instead of looking for the existence of the column you check to see the xxxDATAMOTOxxxCustomersTable exists.


 


PRINT ‘STARTING PREDEPLOYMENT SCRIPT…’


SET NOCOUNT ON


–Add IsKeyCustomer Column to Customer Table. Required version 2->3 upgrade.


IF NOT EXISTS(


      SELECT 1 FROM [sys].[syscolumns]


            WHERE [sys].[syscolumns].[id] = OBJECT_ID(‘[Customer].[Customers]’)


                  AND [sys].[syscolumns].[name] = ‘IsKeyCustomer’)


BEGIN


      PRINT ‘STARTING ADDIsKeyCustomer – PREDATAMOTO_CUSTOMERS_TABLE’  



 


Database version history tables are pretty common as well.  They became very common when it showed up as a part of the Adventure Works sample database SQL Server shipped back in 2000 (dbo.AWBuildVersion). The Database Version History strategy is an approach you use to record that a script has run or the database schema is of a particular version.  This approach typically includes a table that is persisted as a part of the maintained database schema and upgrade scripts that will query this table to determine if they need to run. When they do run they update the table with a new version or add a record that the script has been ran. These tables are usually very simple including only the bare essentials. Here is an example of a Database Version History Table.


 


CREATE TABLE [dbo].[DBBuildVersion](


      [Id] [tinyint]    IDENTITY(1,1) NOT NULL


CONSTRAINT PK_Id PRIMARY KEY,


      [DBversion]       [NVARCHAR](25) NOT NULL,


      [VersionDate]     [DATETIME] NOT NULL


CONSTRAINT DF_VersionDate DEFAULT GETDATE())


 


And here is our example using the database version history table:


 


PRINT ‘STARTING PREDEPLOYMENT SCRIPT…’


SET NOCOUNT ON


–Add IsKeyCustomer Column to Customer Table. Required version 2->3 upgrade.


SELECT 1 FROM [dbo].[DBBuildVersion]


      WHERE [VersionDate] =


            (SELECT MAX([VersionDate]) FROM [dbo].[DBBuildVersion])


                  AND RTRIM(LTRIM([DBVersion])) = ‘2’


BEGIN


      PRINT ‘STARTING ADDIsKeyCustomer – PREDATAMOTO_CUSTOMERS_TABLE’  



 


At the end of your deployment, in the last statement of your post deployment script, you add the new version to the DBBuildVersion table.


 


INSERT [dbo].[DBBuildVersion](DBversion)VALUES(3)


GO


 


Database version stamping is not very common, but is gaining popularity. This strategy includes objects as a part of the database schema that indicates the version of the database schema. Like the database version history table, the version stamp indicates the current version of the database schema, but instead of the version being persisted as data, it is persisted as an object. The object can be a Stored Procedure that returns the version of the database from a literal, a View that provides version history from a SELECT statement of literals, CLR object that reads information from registry, Extended Properties of the database or database objects. There are many ways you can implement this approach. This approach typically includes an object that will provide the current version of the database, maintained as a part of the database schema, and upgrade scripts that will query this version to determine if they need to run. Unlike the database version history table they do not persist any data to the database once an upgrade script has ran.  Here is an example of the database version stamp using an Extended Property.


 


Added CustomerDB.extendedproperties.sql to project which contains:


 


EXEC sp_addextendedproperty @name=‘DBBuildVersion’,


    @value =‘3’,


    @level0type = NULL,


    @level0name = NULL,


    @level1type = NULL,


    @level1name = NULL,


    @level2type = NULL,


    @level2name = NULL


 


See how I specified ‘3’ as the value. Deployment will actually update the extended property with what is defined in the project during deployment if you are not ignoring extended properties.  It’s important to understand this since by the time the post deployment script executes the version will be what is defined in the project and not the original version value in the database. This means that in your post deployment script you use the state changed by the pre-deployment script to determine if your post deployment action needs to run.  This is actually a good thing since the post deployment action is dependent on the pre deployment action. Alternatively, you can manage the extended property and its value in the pre/post deployment scripts and have the targets original version value around until the very end.  I have another post that coming that uses an approach where the version value actually comes from the build process and not the project.


 


And here is our example using the database version extended property:


 


PRINT ‘STARTING PREDEPLOYMENT SCRIPT…’


SET NOCOUNT ON


–Required for version 2->3 upgrade


IF EXISTS(


      SELECT 1 FROM sys.extended_properties


            WHERE class_desc = ‘DATABASE’


                  AND name = ‘DBBuildVersion’


                        AND (value = ‘2’ OR value =‘1’))


BEGIN


      PRINT ‘STARTING ADDIsKeyCustomer – PREDATAMOTO_CUSTOMERS_TABLE’  



 


BTW, I know the table value function fn_listextendedproperty exists. I prefer to use the sys.extended_properties table. I am hoping the SQL Team adds T-SQL support for extended properties eventually. When I think about it the following just feels right.


 


CREATE PROPERTY ‘DBBuildVersion’ ON DATABASE WITH VALUE ‘?’


GO


SELECT * FROM DATABASE.PROPERTIES


GO


UPDATE DATABASE.PROPERTIES SET DBBuildVersion = ’3’


GO


 


Back to topic…


I have seen many customer implementations of data motion and upgrade scripts.  There are reasons why one strategy may be more appealing than another and each affords some benefits that the others do not.  I believe the strategies are most effective in combinations.  I often recommend customers consider the Expected Schema and Database Version Stamp strategies combination.  These 2 approaches work very well for teams using source code control and the database version is managed at design time.  This combination also does not rely on the existence of data which may or may not be there for a database target or might get stepped on by database users or other code running against the database. The last thing you want is the database history table to get tampered with and scripts run when you did not intend them to.  However, a database version history table can be useful for many things, but it should not be what ultimately controls the execution of your data motion or upgrade scripts. Using a persisted data to control execution actually will complicate your data motion scripts as you have to account for the non-existence of data.


 


Now, let’s update our example to use the Database Version Stamp and Expected Schema Conditions. First we will create our Extended Property in version 2 of the database schema hosted on my sandbox database instance.


 


EXEC sp_addextendedproperty @name=‘DBBuildVersion’,


    @value =‘2’,


    @level0type = NULL,


    @level0name = NULL,


    @level1type = NULL,


    @level1name = NULL,


    @level2type = NULL,


    @level2name = NULL


GO


 


I also add the Extended Property to the database project like so:


 


[CustomerDB.extendedproperties.sql]


EXEC sp_addextendedproperty @name=‘DBBuildVersion’,


    @value =‘3’,


    @level0type = NULL,


    @level0name = NULL,


    @level1type = NULL,


    @level1name = NULL,


    @level2type = NULL,


    @level2name = NULL


GO


 


I then update the pre and post deployment scripts to include expected schema conditions and version checks. In the pre-deployment script I move the version 3 upgrade into a new file in a sub-directory and include the file in the pre-deployment. These look like this:


 


[Script.PreDeployment.sql]


PRINT ‘STARTING PREDEPLOYMENT SCRIPT…’


:r .\DataMotionUpgrades\Version3Upgrade.sql


PRINT ‘PREDEPLOYMENT SCRIPT COMPLETE’


 


[.\Scripts\Pre-Deployment\DataMotionUpgrades\Version3Upgrade.sql]


–Version 3 Pre-Data Motion requred to upgrade version 1-2


IF EXISTS(


      SELECT 1 FROM sys.extended_properties


            WHERE class_desc = ‘DATABASE’


                  AND name = ‘DBBuildVersion’


                        AND (value = ‘2’ OR value =‘1’))   


BEGIN


      PRINT ‘STARTING ADDIsKeyCustomer – PREDATAMOTO_CUSTOMERS_TABLE’  


      SET NOCOUNT ON


      IF NOT EXISTS(


            SELECT 1 FROM [sys].[syscolumns]


                  WHERE [sys].[syscolumns].[id] = OBJECT_ID(‘[Customer].[Customers]’)


                        AND [sys].[syscolumns].[name] = ‘IsKeyCustomer’)


            AND EXISTS(


            SELECT * FROM [sys].[tables]


                  WHERE object_id  = OBJECT_ID(‘[Customer].[Customers]’))                      


      BEGIN


            IF (SELECT COUNT(*) FROM [Customer].[Customers]) > 0


            BEGIN


                  BEGIN TRY


                        BEGIN TRAN PREDATAMOTO_CUSTOMERS_TABLE


                              PRINT ‘BACKING UP [Customer].[Customers]’


                              SELECT      [Id],[FName],[LName],[Email],[Created],[Updated],[Deleted]


                                    INTO [dbo].[v3xDATAMOTOxxxCustomers]


                              FROM [Customer].[Customers]


                              PRINT ‘DELETING [Customer].[Customers]’


                              ALTER TABLE [Accounting].[CustomerInvoices] NOCHECK CONSTRAINT [FK_CustomerInvoices_Customer];


                              DELETE [Customer].[Customers]


                        COMMIT TRAN PREDATAMOTO_CUSTOMERS_TABLE;


                  END TRY


                  BEGIN CATCH


                        ROLLBACK TRAN PREDATAMOTO_CUSTOMERS_TABLE;


                        PRINT ‘PREDATAMOTO_CUSTOMERS_TABLE TRANSACTION ROLLED BACK’


                        PRINT CAST(ERROR_NUMBER() AS CHAR(5)) + ERROR_MESSAGE()


                  END CATCH


            END


            ELSE


                  PRINT ‘NO RECORDS TO WORRY ABOUT!!’


      END


      ELSE


            PRINT ‘WARNING!!!!!: SCHEMA IS AN UNEXPECTED STATE FOR THIS UPGRADE SCRIPT AND DATABASE VERSION’


      SET NOCOUNT OFF


END


 


[Script.PostDeployment.sql]


PRINT ‘STARTING POSTDEPLOYMENT SCRIPT…’


:r .\DataMotionUpgrades\Version3Upgrade.sql


PRINT ‘POSTDEPLOYMENT SCRIPT COMLETE’


 


[.\Scripts\Post-Deployment\DataMotionUpgrades\Version3Upgrade.sql]


–Version 3 Pre-Data Motion requred to upgrade version 1-2


SET NOCOUNT ON


IF EXISTS(


      SELECT 1 FROM sys.extended_properties


            WHERE class_desc = ‘DATABASE’


                  AND name = ‘DBBuildVersion’


                        AND value =‘3’) 


BEGIN


      IF EXISTS(


            SELECT * FROM [sys].[tables]


                  WHERE object_id  = OBJECT_ID(‘[dbo].[v3xDATAMOTOxxxCustomers]’))


      BEGIN


      BEGIN TRAN POSTDATAMOTO_CUSTOMERS_TABLE;       


      BEGIN TRY


            PRINT ‘RESTORING [Customer].[Customers]’


            IF (SELECT COUNT(*) FROM [dbo].[v3xDATAMOTOxxxCustomers]) > 0


            BEGIN


                  SET IDENTITY_INSERT [Customer].[Customers] ON


                  INSERT [Customer].[Customers]


                        ([Id],[FName],[LName],[Email],[IsKeyCustomer],[Created],[Updated],[Deleted])


                        SELECT


                              [Id],


                              [FName],


                              [LName],


                              [Email],


                                    CASE


                                          WHEN EXISTS


                                          (SELECT SUM(InvoiceTotal) FROM [Accounting].[CustomerInvoices]


                                                WHERE [Accounting].[CustomerInvoices].[CustomerId] =


                                                      [dbo].[v3xDATAMOTOxxxCustomers].[Id]


                                                            HAVING SUM(InvoiceTotal) >= 500000)


                                          THEN 1


                                          ELSE 0


                                    END,             


                              [Created],


                              [Updated],


                              [Deleted]


                        FROM [dbo].[v3xDATAMOTOxxxCustomers]


                              ORDER BY [dbo].[v3xDATAMOTOxxxCustomers].[Id]


                        SET IDENTITY_INSERT [Customer].[Customers] OFF


            END


            ELSE


            BEGIN


                        PRINT ‘WARNING!!!!!: SCHEMA IS AN UNEXPECTED STATE FOR THIS UPGRADE SCRIPT’


            END


            PRINT ‘DROPPING [dbo].[v3xDATAMOTOxxxCustomers]’           


            DROP TABLE [dbo].[v3xDATAMOTOxxxCustomers]                 


            COMMIT TRAN POSTDATAMOTO_CUSTOMERS_TABLE;


      END TRY


      BEGIN CATCH


            ROLLBACK TRAN POSTDATAMOTO_CUSTOMERS_TABLE;


            PRINT ‘POSTDATAMOTO_CUSTOMERS_TABLE TRANSACTION ROLLED BACK’     


            PRINT CAST(ERROR_NUMBER() AS CHAR(5)) + ERROR_MESSAGE()


      END CATCH


      END


END


SET NOCOUNT OFF


 


Here is what the project system looks like now with these new pre and post deployment files:



Data Moto P2


 


Ok, I am ready now to test my updated data motion against a previous version of the database. To do this I pull previous version of the database project from source code control or from build drops if you have them around.  I do this for both version 2 and version 3 of the schema as described in a previous post.  I then deploy my latest database project against the previous versions of the database.   Here is the output from both deployments.


 


Deployment output of latest and greatest against version 2.  Our data motion runs…just what we want.


 


—— Deploy started: Project: CustomerDB, Configuration: Debug Any CPU ——


CustomerDB.dbschema(0,0)Warning TSD01269: The column [Customer].[Customers].[IsKeyCustomer] on table [Customer].[Customers] must be added, but the column has no default value and does not allow NULL values. If the table contains data, the ALTER script will not work. To avoid this issue, you must add a default value to the column or mark it as allowing NULL values.


    Deployment script generated to:


C:\Temp\Data Motion\CustomerDB\sql\debug\CustomerDB.sql


 


    STARTING PREDEPLOYMENT SCRIPT…


    STARTING ADDIsKeyCustomer – PREDATAMOTO_CUSTOMERS_TABLE


    BACKING UP [Customer].[Customers]


    DELETING [Customer].[Customers]


    PREDEPLOYMENT SCRIPT COMPLETE


    Dropping DBBuildVersion…


    Starting rebuilding table Customer.Customers…


    Caution: Changing any part of an object name could break scripts and stored procedures.


    Caution: Changing any part of an object name could break scripts and stored procedures.


    Creating DBBuildVersion…


    STARTING POSTDEPLOYMENT SCRIPT…


    RESTORING [Customer].[Customers]


    DROPPING [dbo].[v3xDATAMOTOxxxCustomers]


    POSTDEPLOYMENT SCRIPT COMLETE


========== Build: 1 succeeded or up-to-date, 0 failed, 0 skipped ==========


========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========


 


Deployment output against version 3.  No Mods.  Again, just what we want.


 


—— Deploy started: Project: CustomerDB, Configuration: Debug Any CPU ——


    Deployment script generated to:


C:\Temp\Data Motion\CustomerDB\sql\debug\CustomerDB.sql


 


    STARTING PREDEPLOYMENT SCRIPT…


    PREDEPLOYMENT SCRIPT COMPLETE


    STARTING POSTDEPLOYMENT SCRIPT…


    POSTDEPLOYMENT SCRIPT COMLETE


========== Build: 1 succeeded or up-to-date, 0 failed, 0 skipped ==========


========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========


 


Finally, I also need to test a clean deployment so I deploy to a new target. Here is the output.


 


—— Deploy started: Project: CustomerDB, Configuration: Debug Any CPU ——


    Deployment script generated to:


C:\Temp\Data Motion\CustomerDB\sql\debug\CustomerDB.sql


 


    Creating CustomerDB…


    STARTING PREDEPLOYMENT SCRIPT…


    PREDEPLOYMENT SCRIPT COMPLETE


    Creating Accounting…


    Creating Accounting.CustomerInvoices…


    Creating Customer…


    Creating Customer.Customers…


    STARTING POSTDEPLOYMENT SCRIPT…


    POSTDEPLOYMENT SCRIPT COMLETE


========== Build: 1 succeeded or up-to-date, 0 failed, 0 skipped ==========


========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========


 


As you can see, managing data motion takes up front planning and effort, but provides many benefits.  You can now deploy with more confidence and can keep your version upgrades scripts as a part of the latest source.  Deploying to earlier versions of your database is now a breeze. 


 


One thing I want to point out is that this is a very simple example.  More complicated schema changes against more sophisticated schemas work with the same approaches, but will take additional planning and consideration. I have gotten a lot of feedback and questions on the previous posts.  One question that I have gotten a few times is how to manage data motion when your schema changes involve tables with constraints between them and the modifications are to all tables within these dependencies.  This is where advance planning pays off.  For these types of deployments these approaches work just as well, but you will have more work to do upfront as the number of data motion and dependencies to manage during deployment increases.


 


In my next post I will show you how to setup your version stamp so that you can increment the version from a build process.


 


 


 

Comments (7)

  1. tomholl says:

    Awesome info – just what I was looking for.

  2. steve_baker says:

    In March, in "Offline Schema Development" you tell us that DBPro will simplify our lives by getting rid of the "Upgrade Scripts" that we manually have to maintain which become "high maintenance".

    just my two cents, but this solution is now higher maintenance, and lacks the simplicity of maintaining a set of source controlled update scripts that manage both schema and data updates.

    now instead of one script to check in when i want to apply a schema update to a database (and migrate data accordingly), i’ll have to check in 3 files. one for the schema change, a pre-deployment script, and post-deployment script. by breaking the script into 3 different sections you now need to check if the pre-deployment script has run and if the automated schema change was applied successfully in the post-deployment script just making things more complicated.

    before switching to DBPro i could write scripts that updated schema and data in the same script, which would be run inside a transaction. when updating my database i could update from v1 to vN and know that if any one script failed along the way my database would be in a known good state of the last update script version that succeeded.

    by separating these updates into three scripts if something fails it seems to me an all or nothing transaction, or a mess to clean up afterword.

    until DBPro comes up with an ingenious way to manage data motion like it has to manage schema changes i’ll just have to pass on recommending it to clients, its just not the simplest approach to agile database management.

  3. Steve Brownett says:

    Thanks for the articles, they helped me understand database projects much better.

    Unfortunately I have hit a snag which I was hoping that you would be able to help me with.

    I need my predeploy script to do something similar to the following (I will be replacing the select with inserts and updates). Unfortunately it causes a compile time error because the column does not exist when it compiles the query.

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'Customer' AND COLUMN_NAME = 'Title')

    begin

       select id, title from customer

    end

    Interestingly this does not happen when tables do not exist, only columns.

    This issue seems to undermine the whole purpose of being able to have a freeflowing schema between versions.

    The only way I can think of around this (and the same idea everyone has on forums) is to run the select as dynamic SQL, which is far from ideal.

    I am sure there is a better way and was hoping that you might be able to help?

  4. bahill says:

    Hi Steve,

    The pre and post deployment scripts are not checked against the schema defined the project.  They always run unvalidated for the existence of objects perspective. You are on the right track with the IF EXISTs check you have, Are you saying it fails when you deploy your project to the database?

    Thanks.

  5. jamiet says:

    Steve,

    Check the Build Action property of your script. if it is set to 'Build' then its wrong.

    If your script is te "root" of your pre deployment stuff (i.e. Script.preDeployment.sql) then Build Action should be 'PreDeploy'. if your script is a script that is called from Script.preDeployment.sql using the :r syntax then Build Action should be 'Not in Build'.

    Regards

    Jamie

  6. Cody says:

    The reason I prefer to use a table for storing the database version instead of a stored procedure or extended property is that you can also store a date with it, so you know when it was done, and also the incremental steps that were applied along the way (or accidentally missed). Just having one number on a large database isn't enough…

  7. Florian says:

    Thank you! It's very helpful!