About Change Data Tracking
SQL Server Change Tracking, also known as CT, is a lightweight tracking mechanism introduced for the first time in SQL Server 2008. It can be used to track the DML changes performed in SQL Server database tables. SQL Change Tracking can be configured in all SQL Server editions, including the free Express edition.
SQL Server Change Tracking is a synchronous tracking mechanism. The changes on information will be available directly once the DML change is committed, without the need for a delay while reading the changes from the Transaction Log file, such as the Change Data Capture asynchronous mechanism.
Change Data Tracking is only possible when the source database is Microsoft SQL Server.
Read about other change replication techniques available in Etlworks.
How it works
This technique is a variation of change replication using high watermark (HWM). It automatically generates and executes HWM queries to get changes from the source table since the last run. It also supports the initial (full) load.
The first run (full load) automatically generates a query that selects all records from the source table, sets the change version to 0, and action to c
(insert). Example:
select demo.dbo.city.*,
'c' debezium_cdc_op,
0 debezium_cdc_timestamp,
0 hwm_change_version
from demo.dbo.city
For each consecutive run, it uses SQL Server system functions CHANGETABLE
and CHANGE
to get the new/updated/deleted records based on the stored or calculated change version (HWM). Example:
SELECT ct.city_id,demo.dbo.city.*,
case CT.SYS_CHANGE_OPERATION
when 'I' then 'c'
when 'U' then 'u'
when 'D' then 'd' end debezium_cdc_op,
CT.SYS_CHANGE_VERSION debezium_cdc_timestamp,
CT.SYS_CHANGE_VERSION hwm_change_version
FROM CHANGETABLE (CHANGES demo.dbo.city,{HWM}) as CT
LEFT JOIN demo.dbo.city ON ct.city_id=demo.dbo.city.city_id
ORDER BY SYS_CHANGE_VERSION
Columns debezium_cdc_op
and debezium_cdc_timestamp
are added to the SQL to support automatic MERGE into the destination table. They are not added to the destination table. Column hwm_change_version
is used as highwatermark (HWM) and is automatically added to the destination table when it is created. If the destination table already exist you need to alter it by adding hwm_change_version
and populating it with the highest value returned by CHANGES {table}
(or 0 if you want to reload the data).
Prerequisites
ALTER TABLE dbo.test_table_with_ct
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = OFF)
The source table must have the PRIMARY KEY
for CT to work.
Process
Step 1. Create any flow where the source is Database, for example Database to database
or Database to Snowflake
.
Step 2. Select source FROM
and destination TO
databases and tables. To ETL multiple tables using a single source-to-destination transformation set FROM and TO to a wildcard. Read more about wildcard processing.
Step 3. Click MAPPING
, then select Change replication
tab.
Step 4. Select either Change Tracking with calculated HWM
(recommended) or Change Tracking
in Type
. Read about the difference between these two options.
Step 5. Select Parameters
tab.
If the destination is a cloud data warehouse or the flow is one of the available bulk load flows:
Select CDC MERGE
in Action
and enable Predict Lookup Fields
.
If the destination is another relational database:
Select Record with Merge
in Action
and enable Predict Lookup Fields
. If deletes can be ignored, it is highly recommended to enable which significantly improves performance.
Step 6. Select Parameters
tab and enable Create new database connection for source
.
Override Change Tracking SQL in Source query
As was explained here the change tracking SQL is generated automatically, separately for initial full load and for each consecutive run.
To override the SQL simply enter it in the Source query. You must provide SQLs for the initial load and consecutive runs by separating SQL statements with ";".
The following tokens and fragments of code are expected in the Source query:
{table}
: the source table name. You can use token{table}
as is, it will be replaced on actual table name at runtime. This token or actual table name must be in SQL statements for initial and consecutive runs.SELECT {keys},
: the part of the SQL statement for consecutive run which selects the values from the primary key columns.FROM CHANGETABLE (CHANGES {table},{HIGH_WATERMARK}) as CT
: the part of the SQL statement for consecutive run which joins source table with
LEFT JOIN {table}
ON {join_cond}CHANGETABLE
usingHIGH_WATERMARK
.{debezium_cdc_op}
: by default this column is populated asCT.SYS_CHANGE_OPERATION
and contains the type of change operation ('c' for INSERT, 'u' for UPDATE and 'd' for DELETE). This column is recommended but it is required for MERGE only. Default it to 0 for initial load.{debezium_cdc_timestamp}
: the numeric column which contains theCT.SYS_CHANGE_VERSION
for that row. This column is recommended but it is required for MERGE only. Default it to 0 for initial load.
Example:
-- initial
select {table}.*,
'c' debezium_cdc_op,
0 debezium_cdc_timestamp,
0 hwm_change_version
from {table};
-- consecutive runs
SELECT {keys},
{table}.*,
case CT.SYS_CHANGE_OPERATION
when 'I' then 'c'
when 'U' then 'u'
when 'D' then 'd'
end debezium_cdc_op,
CT.SYS_CHANGE_VERSION debezium_cdc_timestamp,
CT.SYS_CHANGE_VERSION hwm_change_version
FROM CHANGETABLE (CHANGES {table},{HIGH_WATERMARK}) as CT
LEFT JOIN {table}
ON {join_cond}
ORDER BY SYS_CHANGE_VERSION
Types of Change Tracking
There are two types of change tracking
Change Tracking with calculated HWM
: when this type is selected, the flow automatically enables calculated HWM. Hence, it queries the destination database usingselect max(hwm_change_version) from dest_table
to get the highest value of the change version stored in the destination table during previous loads and uses it to get the latest changes from the source database. We recommend setting type toChange Tracking with calculated HWM
for all cases except when querying the destination database is too slow.Change Tracking
: when this type is selected, the flow stores the highest value of the change version in the internal storage and uses it to get the latest changes from the source database.
Use Change Tracking with calculated HWM
whenever possible. Use Change Tracking
if executing select max(hwm_change_version)
from dest_table
on a destination table is slow or when the destination is not a relational database.
Comments
0 comments
Please sign in to leave a comment.