High Volume Update Performance


SQL Server does a great job of optimizing processing across a wide range of runtime scenarios, while also providing a flexible configuration and execution environment. One of the tradeoffs associated with this level of flexibility is that different configurations and implementations, while logically equivalent in terms of results, can yield very different performance.


Recently while performing data conversion on a large database my team encountered a good example of this situation. As part of the conversion we needed to UPDATE several VARCHAR and one DATETIME column(s) on a 500 million row table. Due to the highly variable string lengths in the target VARCHAR columns some page splits where expected, and because all values of the target columns needed to be touched a table scan was unavoidable. As this was a onetime conversion I didn’t initially worry about its performance, and because I was using a new server with 16 CPUs and 128 GB of RAM, I assumed SQL Server would provide good performance, even with an in-place update of this size.

After running for 24 hours, I took a look at the performance monitor counters. The process was serialized on a single CPU, and disk bound on the volume containing the target table. Given the time invested I decided to let the update finish, which it did in 2 days, 11 hours.

Discussing the latency with other team members, we speculated that physical disk contention might account for a significant portion of the run time. Because we were changing all the rows in the table we had the option of using INSERT instead of UPDATE to split the read and write I\O to different disks. To set this up we scripted the table and created a work table with the same schema. We also needed to move one physical data file to a separate file group. Of course, following the update, the table also needed to be copied back to the original file group.

At this point we were ready to test the INSERT, which ran in 8 hours, an 86% reduction in run time. Additional execution time can be saved by using SELECT INTO (recovery model was simple), which is minimally- logged. You cannot specify a file group when using SELECT INTO, so to retain the split I\O performance you must set the target file group as the default before running the insert. Also any existing indexes must be rebuilt. Results are summarized in the table below:

Statement     Runtime (H:M)     % Runtime Reduction

Update           59:11                    na

Insert             8:23                      86.8

Select Into     5:02                      91.5

See the following for more information regarding recovery models and minimally-logged operations:

BOL: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/8cfea566-8f89-4581-b30d-c53f1f2c79eb.htm

MSDN: http://msdn2.microsoft.com/en-us/library/ms191244.aspx

This exercise provided an example of how certain operations can be non-optimal even with a great database like SQL Server. More importantly, it showed how certain optimizations (in this case for I\O, by allowing all reads from one disk and all writes to another), can result in great performance. If you think about it, this is the same reason we put the log files on a different drive from the data files in a database. The log drives can do simple sequential writes. Thanks to Umachandar Jayachandran and Len Wyatt for contributing to this article.

— Robin

Comments (7)

  1. I did a VARCHAR(20) -> CHAR(2) modify on a 1 billion row table, it took 3+ hours and in a single thread.

    5+ or 8+ hour for 500 million, can we do better on that? Life is short, play more!

    Basiclly your solution is dump + bulk load. For dump, do you have any suggestion on how we can utilize multiple cores?

    For bulk load, one should prepare files (best in native format) equal to the thread the server can run, turn DB into BULK_LOGGED (not SIMPLE), drop all index, rebuild all index, this and that. Which means not suitable for online application.

    Make a partitionable table may be a good idea, you can prepare a partition from dump/scratch, then join it in 1 second.

    Dong

  2. sqlperf says:

    Hi,

    Your example of altering a column is slightly different use of the bulk insert technique (doesn’t matter which one). Altering a column can be a simple metadata change (when you modify say int to bigint) or a complete rewrite of the row. It is the latter that can be a real problem to manage. SQL Server doesn’t yet have the features that can make use of ALTER TABLE less intrusive. You can find more details on ALTER TABLE at the link below:

    http://www.sqlmag.com/Article/ArticleID/40538/Inside_ALTER_TABLE.html

    Now, as for your problem of how to optimize the column modification the best way is to drop & recreate the table. This can be achieved without dumping the data externally to a file by doing the following:

    1. Use the SELECT…INTO command to create a new table. This allows SQL Server to utilize more CPUs since the SELECT part can run in parallel. This operation is also minimally logged depending on your recovery model

    2. Next, you need to swap the new and old tables. This part is tricky depending on the references to the table and how many transactions happened on the old table. You will have to recreate any indexes, defaults, check constraints also.

    You can partition the table especially for large tables like you quoted. But that doesn’t help in this case since you will have to touch every row to modify the metadata anyway. Partitioning does help for incremental bulk loads and manageability.

    Optimizing bulk load to use multiple cores is a different scenario and that doesn’t apply here because you don’t want to export the data outside the database (unless you are space constrained) & import it back. It will be much slower. If you are interested in knowing more about how to optimize bulk load to use multiple cores by either partitioning the file or using different parameters (BULK INSERT, OPENROWSET BULK, BCP) let me know and I will post a separate article about it.

    Thanks

    Umachandar

  3. Michael Brönnimann says:

    Parallelization for queries (SELECT) is covered quite well by the SQL engine itself, but when it comes to large volume data modifications (UPDATE, INSERT, DELETE), the standard engine does parallelize towards best use of all available resources (disk, multiple cpu-cores, etc.).

    Therefore you may have a look into the approach of SQL Parallel Boost at

    http://www.ibax.ch/…/default.aspx

    This approach can also be used to execute multiple SQL statements in parallel.

    A purely SQL engine related parallelisation solution takes advantage of minimized complexity and has no 'external' components like SSIS involved, Furthermore it's the best performing solution regarding task splitting and synchronization, as it hasn't potential connection and communication overhead. The overall performance gain thru parallelisation with SQL Parallel Boost is up to 10 !

    In case you don't wan't to rebuild your own solution, SQL Parallel Boost provides a self-contained pure T-SQL based solution, which can be easily embedded in existing applications and ETL process tasks.

  4. Michael Brönnimann says:

    SQL Parallel Boost improves large volume data modification operations up to factor 10 (!). The free 'Community Edition' of can be downloaded at sqlparallelboost.codeplex.com

  5. benstaylor says:

    One way I have done this BIG task is to take advantage of BCP.

    You can easily parallize the process by using BCP QUERYOUT. Then you can Write a query that extracts the data and modifies the necessary column data in the extract and send it to a file.

    You can easily have multiple BCP queries executing different parts of the table in parallel (break it down by range of rows in the table) and send this output into multiple export BCP files, preferrably on different spindles.

    Truncate the table, alter the schema as desired (if that is what is being done).

    De-activate all your non-clustered indexes.

    Then you can set up multiple BCP processes importing the massaged data back into the table from multiple files, on multiple disks.

    Now that the data has been restored, re-build all your non-clustered indexes to re-activate them.

    BTW – SQL Parallel Boost really does work too!

  6. Ricardo says:

    In my case I had to use the SELECT INTO because my destination table (empty) have to be partitioned and compressed so I used the hint TABLOCK to minimize the log.