Change Data Capture quick start

By Mirek on (tags: audit log, Change Data Capture, categories: code)

Change Data Capture is a feature of Sql Server (available since version 2008) which allows to capture all insert, update and delete operations on database tables. Here I am going to show you the simple way of enabling CDC on database, table and specific columns.

The first thing to notice is that the Change Data Capture is available only on the Enterprise, Developer, and Evaluation editions of SQL Server.
Let’s assume we have a following tables



And we want to capture and log all inserts, updates and deletes made on table Orders. First we need to enable CDC on the database.

EXEC sys.sp_cdc_enable_db

This function also creates a cdc schema and all necessary tables, such as cdc.change_tables which will hold all so called  capture instances. Basically a capture instance is an object which holds the information about the table and specific columns which are going to be captured during single database transaction. We can create at most two capture instances per source table.
Next we need to enable a specific table and columns to be captured using sys.sp_cdc_enable_table function. Assume we want to capture table Orders with all its columns.

EXEC sys.sp_cdc_enable_table 
  @source_schema = N'dbo',
  @source_name = N'Orders',
  @role_name = NULL,
  @capture_instance = NULL,
  @supports_net_changes = 1,
  @captured_column_list = N'Id, ProductEAN, CustomerId, Quantity, OrderDate',
  @filegroup_name = N'PRIMARY';

The detailed description of all parameters is available in the documentation (see references below), but let me briefly comment on the most important. You can set the name of the capture instance by providing a parameter @capture_instance. When this is left as null the CDC will automatically name the instance as the source table prefixed with the schema name and underscore, so here we get dbo_Orders. Next @captured_column_list defines columns to be captured. Leaving this null will include all columns in the source table as default.
Executing above function takes a while. This is because two Sql Server Agent jobs are created during this call. One is for capturing data (cdc.[databasename]_capture) and one for cleaning the logs (cdc.[databasename]_cleanup). Notice that Sql Server Agent must be running for this to complete. Also new system table is created with name cdc.dbo_Orders_CT. This table will contain all captured data from table Orders.

Let’s now add, update and delete some data from table Orders and see how the data is captured by CDC

INSERT INTO Customers (Name) VALUES ('John Nowak');
INSERT INTO Orders (ProductEAN, CustomerId, Quantity, OrderDate) 
VALUES ('1234', 1, 3, GETDATE());
UPDATE Orders SET ProductEAN = '5678' WHERE ProductEAN = '1234';
DELETE FROM Orders WHERE ProductEAN = '5678';

To get the changes we need to query the dbo_Orders_CT table

SELECT * FROM cdc.dbo_Orders_CT


As you can see the CDC has captured four operations. The column __$operation contains the operation type which indicates as following 1 = delete, 2 = insert, 3 = before update, 4 = after update. So for each update on the source table we get two rows in the ct table. One is before update and the second is after the update. Notice that those two rows has the same value in first column __$start_lsn which basically identifies the same database transaction.
The __$update_mask column indicates which columns on the source table were modified. This mask can be easily resolved by sys.fn_cdc_is_bit_set function

SELECT __$update_mask ,
        ( SELECT    CC.column_name + ','
          FROM      cdc.captured_columns CC
                    INNER JOIN cdc.change_tables CT ON CC.[object_id] = CT.[object_id]
          WHERE     capture_instance = 'dbo_Orders'
                    AND sys.fn_cdc_is_bit_set(CC.column_ordinal,
                                              PD.__$update_mask) = 1
          XML PATH('')
        ) AS changedcolumns
FROM    cdc.dbo_Orders_CT PD

Which returns