Managing Schema Changes (Part 2)

SQLCAT Blog:  Managing Schema Changes (Part 2)

 

Question: What are best practices for managing schema changes in SQL Server 2005?  What are the fastest, least intrusive techniques for high availability? 

 

Answer:  In Part 1, we outlined the behavior of SQL Server 2005 schema changes.  In Part 2, we will look at best practices techniques to de-couple the batch updates required (see “Managing Schema Changes Part 1”) for data type and most length changes.  These techniques allow us to make schema changes quickly and with least impact to performance and concurrency for large tables like the Billion row scenario.

 

Scenario:   Let’s assume we have a 5 character ZipCode column that we want to increase to 10 character format e.g. 12345-0001.  We can change the ZipCode length from char(5) to char (10) using:

 

alter table alter column ZipCode Char(10)

 

However, this will incur a concurrency crippling batch update (as seen in Profiler) to change the column length in our Billion row table scenario.  If you’d like to capture this in Profiler, see “Managing Schema Changes (Part 1).

Best Practice

The use of a NULL property (excluding DEFAULT WITH VALUES of course) is a best practice insofar as columns can be added quickly.  In the case where you do need to set a value to the column, the NULL property also gives us the flexibility of de-coupling the update from the alter table statement. 

 

An approach to accomplish this: 

  1. Add a char(10) column, NewZip
  2. Set piecemeal batch size
  3. Populate NewZip using an explicit conversion of ZipCode
  4. Drop old column ZipCode.
  5. Rename NewZip to ZipCode (optional)

 

Step 1: Add new column

 

alter table MyTable

add NewZip char(10) NULL

Upon the fast completion of this alter table (about 1 second in my tests), NewZip contains NULLs for all rows in the existing table.  You may need to populate NewZip with the values from another column or other values.  In step 2, SET ROWCOUNT is used to minimize the concurrency impact of write locks by limiting the number of rows that are affected at one time.  An alternative to SET ROWCOUNT is a WHERE clause that specifies certain key ranges, limiting the number of rows affected at one time.

 

Step 2: Set Batch Size for update

 

--- (2) set piecemeal batch size

set nocount on

SET ROWCOUNT 1000

 

Next, the piecemeal update is repeatedly done until there are no NULL values in NewZip.  This example sets all rows using the char(5) ZipCode column.  Alternately, you can selectively set the value of an added column using a lookup or join.

 

Step 3: De-coupled Piecemeal Update

declare @rowcount int, @batchcount int, @rowsupdated int

select @rowcount = 1,@batchcount = 0, @rowsupdated = 0

while @rowcount > 0

begin

      begin tran

            update MyTable

            set NewZip = ZipCode

            where NewZip is NULL

            set @rowcount = @@rowcount

            select @rowsupdated = @rowsupdated + @rowcount

            if @rowcount > 0 select @batchcount = @batchcount + 1

      commit

end

Step 4: drop the char(5) column ZipCode (optional)

alter table MyTable

drop column ZipCode

go

Step 5: Rename the NewZip column to ZipCode (optional)

exec sp_rename 'MyTable.NewZip','ZipCode','COLUMN'

go

--- Observe the data type for ZipCode is now char(10)

sp_help MyTable

Trigger for ZipCode

What happens if a user changes a customer ZipCode AFTER the NewZip update?   If the user application updates ZipCode char(5) during the piecemeal update process, NewZip may have already been updated with the old zip code value, and thus is out of synch with ZipCode.  An update trigger can be used to keep ZipCode and NewZip in synch during the piecemeal update process.  If you intend to drop the old char(5) ZipCode (in Step 4), and then rename NewZip to ZipCode (in Step 5), the update trigger will no longer be needed after the piecemeal update.

Explicit data type conversion

Some data type changes require explicit conversion.  That is, they cannot be implicitly converted using alter table alter column.   For example, an explicit conversion is required to change a datetime datatype to an int or numeric(8) YYYYMMDD format.  This scenario can be handled using a convert (or datepart) function in exactly the same manner as the piecemeal update scenario above.

Dropping columns

Dropping columns is a schema only operation.  There are no concurrency issues unless you move the object to reclaim space from the dropped column.

 

Unique and Primary Key Constraints

Adding Unique and Primary Key constraints involves reads, sorts, and writes.  By default UNIQUE constraints add nonclustered indexes while PRIMARY KEY adds a clustered index.  Obvious performance and concurrency issues would occur when you add a PRIMARY KEY (and default CLUSTERED INDEX) to a large table that already has UNIQUE (and nonclustered index) constraints.  This is because adding a Primary Key (and clustered index) would rebuild the nonclustered indexes replacing RIDs (row IDs) with Primary Keys.

 

Alter table, create and alter index includes ONLINE options for managing unique and primary key constraints and indexes.  Where possible, for best performance, highest concurrency and availability, ONLINE options should be used for managing indexes.  Otherwise, it would wise to postpone these operations to batch maintenance windows.

 

Conclusions and best practices:

SQL Server 2005 does support schema changes however some schema changes are faster than others.  The fastest schema changes occur when:

  1. Adding column with NULL property
  2. Adding column with NULL property and DEFAULT
  3. Changing NOT NULL to NULL property
  4. Adding DEFAULT constraint
  5. Dropping CHECK or DEFAULT constraint

 

A schema change best practice for performance and concurrency of large tables includes adding columns with NULL properties.  In the case where you need to set a value to the newly added column, starting with a NULL property provides the flexibility where the update can be de-coupled from the alter table statement.  Far preferable to a batch update that can significantly affect concurrency on large tables; a piecemeal update can be used to limit the number of rows updated at one time.  One technique to limit the number of rows is SET ROWCOUNT.  This greatly reduces (or effectively eliminates) the blocking impact of the piecemeal update.  If it is important to disallow NULL values after the piecemeal update completes, the application would be have to enforce this.

 

The inclusion of CHECK constraints requires a read of the entire table to enforce the CHECK constraint.  The accompanying table scan can impact write activity for a large table.

 

Other schema changes present performance and concurrency challenges for large tables due to accompanying batch update operations.  Those include:

  1. Adding NULL column with DEFAULT and WITH VALUES clauses
  2. Adding NOT NULL column
  3. Changing NULL to NOT NULL property

 

Using the Piecemeal Update best practice with NULL columns allows us to avoid the undesirable associated activities occurring with some schema changes such as unintended reads and writes, enabling us to utilize the fastest, least intrusive techniques for high concurrency and availability.

 

Tom Davidson

SQL Server Customer Advisory Team