Maintaining and synchronizing your reference data through the database project


For database application developers working on data driven applications their reference data is as important and meaningful as the database schema itself.  In fact, the database schema is incomplete when the reference data is absent.  When I say reference data, I am speaking of the non transactional data within the database.  This data goes by many different names in different organizations. Here are some of the more popular ones: Static Data, Domain Data, Seed Tables, Static Data, Meta Data, and Look Ups.


 


I have worked on many projects were we managed the reference data along with the rest of the application’s source code artifacts. It was always critical that these reference data were managed under the same configuration management process as the rest of the application.  The reasoning behind this was that I/we (depending on team and gig) typically implemented data driven applications with flexible data models. This often afforded the customer flexibility to comprehend business requirements down the road without expensive code modifications.  As a result, without the correct or appropriate reference data these applications would not be very functional or worse misbehave. I have also worked on projects where there was reference data, but in addition the application used hard coded values that controlled the execution of the application. Without predetermined reference data these applications would not run and often have runtime errors.  While not good design, I have seen it many times. This typically happens when apps go through a top down design, but I digress…


 


We often receive questions from the database development community as to how to best manage reference data.  Different strokes for different folks comes to mind, but I thought I would share an approach that works really well for average sized reference data tables. This approach is specific to SQL Server 2008, but can be modified to work with the previous versions of SQL Sever with some simple modifications.


 


Let’s assume you have a table called products in your project like(Products.table.sql):


 


CREATE TABLE [dbo].[Products]


(


      Id INT IDENTITY(1,1) NOT NULL CONSTRAINT PkProductsId PRIMARY KEY,


      Name NVARCHAR(50) NOT NULL CONSTRAINT UqProductsName UNIQUE,


      Description NVARCHAR(250) NULL,


      Created DATETIME NOT NULL,


      Updated DATETIME NOT NULL,


      Deleted DATETIME NULL


     


)


 


Next, you then need a way to synchronize changes to this data. For this company we will sell beer. Guess where my mind is? So you create the following script file (SyncProducts.sql):


 


PRINT ‘Starting [dbo].[Products] Syncronization’


GO


IF EXISTS(SELECT 1 FROM INFORMATION_SCHEMA.TABLES


                  WHERE TABLE_NAME = N‘xxxSyncxxProducts’)


      DROP TABLE [dbo].[xxxSyncxxProducts]


GO


CREATE TABLE [dbo].[xxxSyncxxProducts]


      (Name NVARCHAR(50) NOT NULL,


       Description NVARCHAR(250) NULL)


GO


SET NOCOUNT ON


INSERT [dbo].[xxxSyncxxProducts](Name, Description)


            VALUES           


                (N‘Guinness’, ‘Irish-Style Stout’)


               ,(N‘Carlsberg’,‘Danish Pilsner’)


               ,(N‘Scuttlebutt’, ‘India Pale Ale’)


               ,(N‘Corona’,‘Mexican Lager’)


             –,(N’Your’, ‘Favorite Beer’)                             


SET NOCOUNT OFF


GO


MERGE [dbo].[Products] AS Target


USING [dbo].[xxxSyncxxProducts] AS Source ON (Target.Name = Source.Name)


            WHEN MATCHED


                        AND Target.Description <> Source.Description   


                  THEN


                        UPDATE     


                              SET   Target.Description = Source.Description,


                                    Target.Updated = GetDate(),


                                    Target.Deleted = NULL


            WHEN NOT MATCHED BY TARGET


                  THEN


                        INSERT      (Name, Description, Created, Updated)


                        VALUES      (Name, Description, GetDate(), GetDate())


            WHEN NOT MATCHED BY SOURCE THEN


                        UPDATE SET Target.Deleted = GetDate();                           


GO


DROP TABLE [dbo].[xxxSyncxxProducts];


GO


PRINT ‘Done Syncronizing [dbo].[Products]’


GO


 


Finally, you then need a method to execute the synchronization as a part of deployment. This is done by including the script file in the post deployment script like so (Script.PostDeployment.sql):


 


:r .\RefDataSync\SyncProducts.sql


 


Here’s what it looks like when it is deployed.


 


  


 


During development, if marketing tells you they have a new beer to drink, I mean sell, you simply checkout the SyncProducts.sql file and make the necessary edits and check it back in. Every time you deploy your project, incremental changes to the database’s reference data will be synchronized. also This includes fully populating the table when you deploy a new database. You can also run the script by the opening up the file in Visual Studio and clicking “Execute SQL”. This allows you to make changes out-of-band from the release yet still have the changes captured in SCC.


 


Note that I am using a logical delete approach which is a personal design preference .  You could also blast the records if you like, but then you have to worry about constraints that may get violated. The logical delete works well, because you can retire the row without deleting the data. This allows historical data to keep its relationship without denormalizing the table into an archive.


 


Let me know what you think about this approach. How could this be made easier and what is it missing?

Comments (11)

  1. Hi Barclay,

    I’ve felt for a long time that Datadude needs to do more to take this easier for us. I’ve fed this back ad infinitum on both the the datadude forum and when I’ve met memebers of your team personally.

    The Datadude project needs to be data-aware which it isn’t at the moment. Let me explain what I mean by that:

    -I want the project to understand what data I want to put into the tables

    -I want the data insertion to be a built-in part of the build/deploy process where a data compare happens automatically in the same way that schema compare currently happens as part of a deploy.

    -I want the project build to fail when it knows that the data insertion will fail (e.g. if the value that I am going to insert into a varchar field is too wide for it)

    -The data lives in a dedicated datadude object rather than in a post-deployment script. That object should become invalid if a schema change would cause the data insertion to fail

    -Datadude should check for unique key/foreign key violations in the data

    I remember Gert Drapers asking during the 2007 MVP Summit "Do you want datadude to manage and inherently understand your reference data as well?" (or words to that effect) and the overwhelming feedback was "Yes, please please please build this in!". Still waiting.

    -Jamie Thomson

  2. johnwelch says:

    I’ll second Jamie’s comments – this seems like missing functionality. The practice described in this post may be a workaround, but it’s not really meeting the need.

  3. SkyAddict says:

    The only thing that is a bit of a pain is the schema definition of the table is esentially duplicated. It would be nice if your "temp" table could be created based on the schema of the target.

  4. anathe says:

    I also think that this is a work around to the problem not a real solution.

    I asked about the content management because I already saw it in another small product that manage databases, and I’m missing it in the VSTS DataBase edition.

    http://www.dbmaestro.com/TeamWorkMovies/dbmaestroTeamWorkDemo.html

    Anat Hertzman

  5. vaccac says:

    Here is a tool I’ve been using to automatically generate the static data scripts:

    http://leon.mvps.org/StaticData/

    A data-dude user wrote it and made it available for free. Works like a charm and saves me lots of time.

  6. Jon_K says:

    Barclay, any proposal on doing this same thing here, but with 'environment dependent' data. I want a global sqlcmdvar for selecting $(environment). I could point it to a path execution :r .$(environment)envconfigdata.sql , but sqlcmd doesn't handle variables in the path this way, at least in now way that i can see.

  7. EdL says:

    I've been using T4 and xml file blobs to generate the update, insert and delete statements. Seems to work well and you have a good divide of data and the code.

  8. Anonymous says:

    Great post.  I am implementing this approach right now.  One quick suggestion.  Instead of using temp tables, I would use table variables.  That way, your tempdb database won't have to be bothered while these scripts are running.  Again, thanks for the post.

  9. CR Senior,

    The statement "table variables never use tempdb" is a myth. See here for details: blogs.msdn.com/…/sql-server-table-variable-vs-local-temporary-table.aspx

    JT

  10. Anonymous says:

    Since the code was written to only work in SQL 2008 forward, you could eliminate the intermediate table and use the multi-VALUES syntax directly in the MERGE statement.  It also eliminates the duplicate table definition, though it still must be kept in sync regardless.

    PRINT 'Starting #Products Syncronization'

    GO

    MERGE [dbo].[Products] trg

    USING ( VALUES ( N'Guinness', 'Irish-Style Stout' ),

             &nbsp; ( N&#39;Carlsberg&#39;,&#39;Danish Pilsner&#39; ),
    
             &nbsp; ( N&#39;Scuttlebutt&#39;, &#39;India Pale Ale&#39; ),
    
             &nbsp; ( N&#39;Corona&#39;,&#39;Mexican Lager&#39; )
    
    ) src ( [Name], [Description] )
    
        ON trg.[Name] = src.[Name]
    
    WHEN MATCHED 
    
            AND trg.[Description] &lt;&gt; src.[Description] THEN
    
        UPDATE
    
            SET trg.[Description] = src.[Description],
    
                trg.[Updated] = GETDATE(),
    
                trg.[Deleted] = NULL
    
    WHEN NOT MATCHED BY TARGET THEN
    
        INSERT ( [Name], [Description], [Created], [Updated] )
    
        VALUES ( src.[Name], src.[Description], GETDATE(), GETDATE())
    
    WHEN NOT MATCHED BY SOURCE THEN
    
        UPDATE 
    
            SET trg.[Deleted] = GETDATE();
    

    GO

    PRINT 'Done Syncronizing [dbo].[Products]'

    GO

Skip to main content