About Change Data Tracking
SQL Server Change Tracking, also known as CT, is a lightweight tracking mechanism introduced for the first time in SQL Server 2008. It can be used to track the DML changes performed in SQL Server database tables. SQL Change Tracking can be configured in all SQL Server editions, including the free Express edition.
SQL Server Change Tracking is a synchronous tracking mechanism. The changes on information will be available directly once the DML change is committed, without the need for a delay while reading the changes from the Transaction Log file, such as the Change Data Capture asynchronous mechanism.
Change Data Tracking is only possible when the source database is Microsoft SQL Server.
Read about other change replication techniques available in the Etlworks Integrator.
How it works
This technique is a variation of change replication using high watermark (HWM). It automatically generates and executes HWM queries to get changes from the source table since the last run. It also supports the initial (full) load.
The first run (full load) automatically generates a query that selects all records from the source table, sets the change version to 0, and action to c
(insert). Example:
select demo.dbo.city.*, 0 hwm_change_version, 'c' action
from demo.dbo.city
For each consecutive run, it uses SQL Server system functions CHANGETABLE
and CHANGE
to get the new/updated/deleted records based on the stored or calculated change version (HWM). Example:
SELECT ct.city_id,demo.dbo.city.*,
case CT.SYS_CHANGE_OPERATION
when 'I' then 'c'
when 'U' then 'u'
when 'D' then 'd' end action,
CT.SYS_CHANGE_VERSION hwm_change_version
FROM CHANGETABLE (CHANGES demo.dbo.city,{HWM}) as CT
LEFT JOIN demo.dbo.city ON ct.city_id=demo.dbo.city.city_id
ORDER BY SYS_CHANGE_VERSION
Prerequisites
ALTER TABLE dbo.test_table_with_ct
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = OFF)
The source table must have the PRIMARY KEY
for CT to work.
Process
Step 1. Create any flow where the source is Database, for example Database to database
or Database to Snowflake
.
Step 2. Select source FROM
and destination TO
databases and tables. To ETL multiple tables using a single source-to-destination transformation set FROM and TO to a wildcard. Read more about wildcard processing.
Step 3. Click MAPPING
, then select Change replication
tab.
Step 4. Select either Change Tracking with calculated HWM
(recommended) or Change Tracking
in Type
. Read about the difference between these two options.
Step 5. Select Parameters
tab.
If the destination is a cloud data warehouse or the flow is one of the available bulk load flows:
Select CDC MERGE
in Action
and enable Predict Lookup Fields
.
If the destination is another relational database:
Select Record with Merge
in Action
and enable Predict Lookup Fields
. If deletes can be ignored, it is highly recommended to enable which significantly improves performance.
Types of Change Tracking
There are two types of change tracking
-
Change Tracking with calculated HWM
: when this type is selected, the flow automatically enables calculated HWM. Hence, it queries the destination database usingselect max(hwm_change_version) from dest_table
to get the highest value of the change version stored in the destination table during previous loads and uses it to get the latest changes from the source database. We recommend setting type toChange Tracking with calculated HWM
for all cases except when querying the destination database is too slow. -
Change Tracking
: when this type is selected, the flow stores the highest value of the change version in the internal storage and uses it to get the latest changes from the source database.
Comments
0 comments
Please sign in to leave a comment.