SQL Server Change Data Capture with SAP

Hello I am pleased to introduce Clas Hortien as new contributor to this blog site. He is located in the SAP development facilities in Germany. For many years

his focus is to support SAP customers running on SQL Server. Some of you might know him by his contributions to the SDN SQL Server forum. Clas will

start a series of articles looking at new features of SQL Server 2008 in conjunction with SAP. As a first feature we start with “Change Data Capture” and

how it could be leveraged in combination with an SAP system.

Change Data Capture (CDC) provides information about DML changes on a table and a database. By using change data capture, you eliminate expensive

techniques such as user triggers, timestamp columns, and join queries in order to detect changes. Change Data Capture records modifications on tables on

which CDC got enabled. The details of the changes are available in a normal relational table for selection. CDC only tracks the columns which got modified as

we see later on in this article. Change Data Capture functionality is available only in SQL Server 2008 Enterprise, Developer, and Evaluation editions.

So far SAP does not provide any GUI or interface to setup or control the CDC feature. Since Basis release 720, the DBA cockpit provides some monitoring

functionality, mainly to check the status and to see which tables are monitored through CDC.

 

 

clip_image002

You can find this new screen in the NON-SAP Features section of the Configuration branch in the DBA Cockpit (see the screenshot).

When you plan on using CDC, you should consider the following limitations and impacts:

  • You may not use CDC, when your SAP System ID (SID) is CDC, because the Change Data Capture feature will interfere massively with your SAP system.
  • CDC will have an impact on CPU and space usage in your system. Especially the space usage can be tremendously, when you monitor multiple tables which a high change rate.
  • Please never ever try to monitor the SAP queuing tables like VBDATA, VBHDR and VBMOD as the amount of data is enormous and not useful anyway.
  • Do not monitor the SAP RFC tables like ARFC*, QRFC* and TRFC*.
  • Never monitor all tables of an SAP system simultaneous.
  • Be extremely selective what table to capture changes on and keep the number of tables limited.

To use CDC you have to manually enable it in a SQL Query Window with the following T-SQL commands:

USE PAX

EXEC sp_cdc_enable_db – enabling it on the DB

-- start it for some tables

EXEC sp_cdc_enable_table 'pax',
'REPOSRC', @role_name = null, @supports_net_changes =1;

EXEC sp_cdc_enable_table 'pax',
'SNAP', @role_name = null, @supports_net_changes =1;

EXEC sp_cdc_enable_table 'pax',
'sap_tabstats', @role_name = null, @supports_net_changes =1;

GO

The first command (sp_cdc_enable_db) enables CDC for the current database. All the necessary structures will be created and the database gets prepared

for the CDC usage.

In the example we started the monitoring for three tables (SNAP – Short dumps, REPOSRC – SAP Report sources, sap_tabstats – DBA Cockpit history

data). With the option @support_net_changes an additional function gets created, which shows the net changes on the table. By setting the @role_name

option to null the security of the tables which contains the changes is not restricted. 

Change Data Capture will create a new schema called ‘cdc’ in the SAP database (therefore never use CDC with a SAP system using the SID of CDC). At least

three additional objects are created in the new cdc schema in the SAP database for each table which is in the scope of CDC:

Object name

Type

Example

fn_cdc_get_all_changes_<schema>_<table name>

function

fn_cdc_get_all_changes_pax_SNAP

fn_cdc_get_net_changes_<schema>_<table name>

function

fn_cdc_get_net_changes_pax_SNAP

<schema>_<table name>_CT

table

pax_SNAP_CT

The table <schema>_<table_name>_CT contains the data of the changed rows. It is an exact copy of the table structure of the source table (e.g. SNAP),

with some additional fields (__$start_lsn, __$seqval, __$operation, __$update_mask) at the beginning of the table.

From SQL Server Books online:

Column name

Data type

Description

__$start_lsn

binary(10)

Log sequence number (LSN) associated with the commit transaction for the change. All changes committed in the same transaction share the same commit LSN. For example, if a delete operation on the source table removes two rows, the change table will contain two rows, each with the same __$start_lsn value.

__$end_lsn

binary(10)

Identified for informational purposes only. Not supported. Future compatibility is not guaranteed. In SQL Server 2008, this column is always NULL.

__$seqval

binary(10)

Sequence value used to order the row changes within a transaction.

__$operation

int

Identifies the data manipulation language (DML) operation associated with the change. Can be one of the following: 1 = delete 2 = insert 3 = update (old values) Column data has row values before executing the update statement. 4 = update (new values)

Column data has row values after executing the update statement.

__$update_mask

varbinary(128)

A bit mask based upon the column ordinals of the change table identifying those columns that changed.

<captured source table columns>

varies

The remaining columns in the change table are the columns from the source table that were identified as captured columns when the capture instance was created. If no columns were specified in the captured column list, all columns in the source table are included in this table.

The additional objects can be seen in SQL Server Management Studio under System Tables of the SAP database:

clip_image004

You can use the two functions to show the changes on the table. The function fn_cdc_get_all_changes_<schema>_<table name> returns one row for every change on a table:

DECLARE @begin_time datetime;

DECLARE @end_time datetime:

DECLARE @begin_lsn binary(10);

DECLARE @end_lsn binary(10);

SET @begin_time = '2010-03-01 12:00:00.000';

SET @end_time = '2010-03-10 12:00:00.000';

-- Map the dates

SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn ('smallest greater than', @begin_time);

SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn ('largest less than or equal', @end_time);

SELECT __$start_lsn, __$seqval, __$operation, __$update_mask,PROGNAME, R3STATE, TYPE FROM cdc.fn_cdc_get_all_changes_pax_REPOSRC( @begin_lsn, @end_lsn, 'all');

GO

clip_image006

Above you can see the changes of the column TYPE in the table REPOSRC from space to ‘A’ to ‘B’.

The function fn_cdc_get_net_changes_<schema>_<table name> shows only the very last version of the row:

clip_image008

With this information you will be able to manually revert or replay the changes against data in a table. For long term auditing or monitoring purposes, it is

essential, that the data is moved outside the CDC monitoring tables (e.g. pax_REPOSRC_CT), because the data is kept in those tables for a short period of

time only.

The data in the CDC system tables is automatically purged by an SQL Agent job ( cdc.<DB Name>_Cleanup, e.g. cdc.PAX_cleanup) every night at 2:00 am.

In the default configuration the retention period is 4320 minutes (3 days). Means, only changes younger than 3 days will be available in the cdc monitoring

tables. However one cleanup run will only delete up to 5000 entries per table. You can view this setting in msdb.dbo.cdc_job by running

SELECT retention, threshold, * FROM msdb.dbo.cdc_jobs WHERE job_type = 'cleanup'

Deleting 5000 rows in one cleanup might not be good enough to keep the volume of the CDC tables in check. Therefore if CDC tables grow day after day,

you might want to change these settings. To switch off CDC completely for a given table, you have to run the sp_cdc_disable_table procedure for the

created capture instance:

EXECUTE sys.sp_cdc_disable_table @source_schema = N'pax', @source_name = N'REPOSRC', @capture_instance = N'pax_REPOSRC'

GO

To de-activate CDC for an entire database you have to use the sp_cdc_disable_db procedure.

EXEC sp_cdc_disable_db

GO

This will remove all created objects in the cdc schema within the database.

Since the CDC functionality is based on the SQL Server Replication Log Sniffing Framework, you have to set the configuration option “max text repl size” to

its maximum of 2147483647, otherwise you will get truncation errors for tables with blob fields(e.g. REPOSRC).

The analysis and storage of the collected data has to be done outside of the SAP system. CDC can be used to get historical data about all changes on critical

tables within the system. In contradiction to other data monitoring solutions, CDC offers you the ability to get and store the information about the changes,

not only about the old and new values of columns.

Some useful Microsoft Knowledgebase articles for CDC:

FIX: A backup operation on a SQL Server 2008 database fails if you enable change tracking on this database

FIX: You cannot disable the CDC feature if you drop the cdc.change_tables CDC system table in SQL Server 2008

MSDN Change data capture