Data Warehouse, Clustered Columnstore Indexes and SAS


Clustered Columnstore Indexes (CCI) is a game changer for data warehouse workload, which is available from SQL Server 2014 Enterprise Edition. I worked for a PoC project last year to adopt Clustered Columnstore Indexes for data warehouse. The PoC method is simple but the improvement is very significant. Below is some detailed.

The Pain and the Result

Customer run a 353GB data warehouse on SQL 2008 R2, max memory is 20GB. The ELT tool is SAS, SSAS cube is built on top of data warehouse. The SSAS model is changed frequently and full process is ran every day due to their dynamical business nature. As the physical memory size is small, there is heavy IO loading to storage during data process and customer plan to increase physical memory to 200GB.

By changing the top 30 tables to CCI tables, the database size reduces from 353GB to 75GB, the largest table size reduces from 46GB to 4.5GB! The duration of daily ELT reduces from 35 seconds to 121 seconds when the SQL max memory is 20GB, reduces from 292 seconds to 49 seconds when the SQL max memory is 200GB. The duration of cube process reduces from 35 min to 15 min when the SQL max memory is 20GB, reduces from 21 min to 14 min when the SQL max memory is 200GB. So we are happy to see the huge improvement after adopting CCI without new hardware expense.

Migration Method

Step 1: Find out top tables

In this PoC there is no unsupported data type, no unique key, primary key or foreign key in every fact table as it's a well design data warehouse, I use below SQL find out top 30 tables in terms of size:

use [db_name]
go

select top 30
	i.object_id,
    t.name as table_name,
    s.name as [schema_name],
    p.rows as rowc_ounts,
    sum(a.total_pages) / 128.0 as total_space_mb, 
    sum(a.used_pages) / 128.0 as used_space_mb, 
    (sum(a.total_pages) - sum(a.used_pages)) / 128.0 as unused_space_mb
into #t_top_tables
from 
    sys.tables t
inner join      
    sys.indexes i on t.object_id = i.object_id
inner join 
    sys.partitions p on i.object_id = p.object_id and i.index_id = p.index_id
inner join 
    sys.allocation_units a on p.partition_id = a.container_id
left outer join 
    sys.schemas s on t.schema_id = s.schema_id
where 
    t.name not like 'dt%' 
    and t.is_ms_shipped = 0
    and i.object_id > 255 
group by 
    i.object_id,t.name, s.name, p.rows
order by 
    total_space_mb desc


select * from #t_top_tables

Step 2: Generate T-SQL to drop all existing indexes and create Clustered Columnstore Indexes(CCI)

Below T-SQL is ran in same session in generate drop all existing indexes and create CCI statement:

declare 
@schema sysname,
@table sysname,
@index sysname,
@sql nvarchar(4000)

declare cur cursor for
select o.schema_name, o.table_name, i.name as index_name from sys.indexes i 
inner join #t_top_tables o on i.object_id=o.object_id
where (i.index_id>0 and i.index_id<255) and i.name not like '_WA_Sys_%'

open cur
fetch next from cur into @schema,@table,@index
while @@fetch_status=0
begin
 set @sql='drop index ['+@schema+'].['+@table+'].['+@index +']'
 select @sql 
 fetch next from cur into @schema,@table,@index
end
close cur
deallocate cur

select 'CREATE CLUSTERED COLUMNSTORE INDEX cl_'+table_name+' ON '+schema_name+'.'+table_name
from #t_top_tables 

drop table  #t_top_tables 

Step 3: Run the generated script

Before you run the generated script in production, I recommend you to test in testing environment to measure the duration. Another suggestion is to turn database recovery model to simple to avoid growth of t-log.

SAS ODBC Error

Customer hit below error when they testing the SAS code:

ERROR: CLI describe error: [Microsoft][ODBC SQL Server Driver][SQL Server]Cursors are not supported on a table which has a clustered columnstore index. : [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared.

The statement is a simple select, but SAS translates it to a cursor. With SAS tech support, we finally find out this is related to a SAS optimization configuration: Block read buffer size.

Below is how to set the READBUFF=0 in SAS Management Console:

1. Right click the Library in SMC -> Properties -> Options -> Advanced Options – > Optimization -> Block read buffer size
2. set this property to 0.
http://support.sas.com/documentation/cdl/en/bidsag/61236/HTML/default/viewer.htm#a002668484.htm

Posted by Shiyang, Aug 12 2016


Comments (0)

Skip to main content