Clustered Columnstore Index: Data Load Optimizations - Minimal Logging

When bulk importing large number of rows, you want to make sure it can be done efficiently with minimal resource usage so that that you can finish the data load quickly and start running your analytics. SQL Server provides following three key optimizations during bulk import

  • Minimal logging
  • Reduced overhead of locks
  • Concurrent Inserts

Just like regular tables where data is physically stored as rows, these optimizations are available in tables with columnstore indexes as well but there are some differences. This series of blog explains how each of these optimizations in the context of table with columnstore indexes.

Minimal Logging

Unlike common misconception, it does not mean ‘no logging’. Minimal logging logs the allocation of pages and the changes to the system table but no data. For example, if you are bulk importing 10 million rows, there is no logging of actual data rows but only the page allocations. You may wonder why could SQL not eliminate logging altogether because you can ‘reload’ the data. The issue with this is that if the data load fails, SQL will need to roll back the transaction and will have no way of knowing which pages were allocated.  At this point, hopefully I have convinced you that minimal logging is good to minimize any logging related bottleneck. Let us now look at minimal logging for tables with clustered columnstore indexes (CCI). 

 When you import the data into CCI, the data is loaded either into delta rowgroup or compressed rowgroup based on the rule “If batchsize < 102400, the rows are loaded into delta rowgroup otherwise they are directly loaded into compressed rowgroup”. Minimal logging is only supported when data is directly loaded into compressed rowgroup. This means, to get minimal logging with CCI, you must use a batchsize >= 102400.

 Here are couple of examples

-- create the table

create table t_bulkload (

       accountkey                 int not null,

       accountdescription         char (500),

       accounttype                char(500),

       AccountCodeAlternatekey   int)

 

-- Let us prepare the data

-- insert 110K rows into a regular table

begin tran

declare @i int = 0

while (@i < 110000)

begin

       insert into t_bulkload values (@i, 'description', 'dummy-accounttype', @i*2)

       set @i = @i + 1

end

commit

 

-- bcp out the data... run the following in command window

bcp columnstore..t_bulkload out c:\temp\t_bulkoad-1.dat -c -T

 

--truncate the table

truncate table t_bulkload

 

-- create clustered columnstore index

CREATE CLUSTERED COLUMNSTORE index t_bulkload_cci on t_bulkload

 

-- clean up the transaction log (note, the database is

-- simple recovery mode)

Checkpoint

 

-- now bulkload the data

begin tran

-- this loads the data in 1 batch

bulk insert t_bulkload

   FROM 'c:\temp\t_bulkoad-1.dat'

 

-- show the log records generated

-- note, I am using ‘log record length 1104’ because I know the length of the

-- insert log record.

select count(*)

from sys.fn_dblog(NULL, NULL) as t_log

where allocunitname like '%t_bulkload%' and operation like '%LOP_INSERT_ROWS%' and [Log Record Length] = 1104

 

You will see that it returns 0 as no data rows were logged. Abort the transaction and run the checkpoint command to cleanup the log records. Now let us run the same command but with the batchsize of 10000. Recall, this will load the rows into delta rowgroup

begin tran

-- Now use a smaller batchsize of 10000

-- I am using even TABLOCK

bulk insert t_bulkload

   FROM 'c:\temp\t_bulkoad-1.dat'

   WITH

      (

    BATCHSIZE = 10000

          ,TABLOCK

      )

 

-- show the log records generated

select count(*)

from sys.fn_dblog(NULL, NULL) as t_log

where allocunitname like '%t_bulkload%' and operation like '%LOP_INSERT_ROWS%' and [Log Record Length] = 1104

You will see that the count is 110000, exactly the number of rows we inserted. In other words, the data rows are fully logged (i.e. no minimal logging). So the question is why did we not get minimal logging with delta rowgroup. There are two reasons. First, the delta rowgroup is organized as a btree, not a HEAP. Second, we expect large number of rows to be loaded to CCI therefore have chosen to optimize data load directly into compressed rowgroup. One additional advantage of directly loading data into compressed rowgroup is that it eliminates the additional step of loading data into delta rowgroup and then migrating this data into compressed rowgroups.

 There you have it; our recommendation is to choose a batchsize of > 102400 to get benefits of minimal logging with clustered columnstore index. In the next blog, I will discuss parallel bulk import and locking optimizations.

Please refer to https://www.microsoft.com/en-us/research/publication/enhancements-to-sql-server-column-stores/  for data load into columnstore index "Bulk load rates for clustered column store have been measured at about 600GB/hour on a 16 core machine, using 16 concurrent bulk load jobs (one per core) targeting the same table We did a trickle load test on a single thread whereby we inserted 3.93 million rows, one at a time in independent transactions, into an empty column store index. This was done on a machine with 4 cores and 8 hardware threads. The test was implemented with a cursor reading from a source table and inserting one row at a time into the target. The data was drawn from the Purchase table mentioned earlier. The test took 22 minutes and 16 seconds. The insertion rate was 2,944 rows/second"

 Thanks

Sunil Agarwal