About Change Data Tracking
SQL Server Change Tracking, also known as CT, is a lightweight tracking mechanism introduced for the first time in SQL Server 2008. It can be used to track the DML changes performed in SQL Server database tables. SQL Change Tracking can be configured in all SQL Server editions, including the free Express edition.
SQL Server Change Tracking is a synchronous tracking mechanism. The changes on information will be available directly once the DML change is committed, without the need for a delay while reading the changes from the Transaction Log file, such as the Change Data Capture asynchronous mechanism.
How to synchronize databases using Change Data Tracking
Overview
Change Data Tracking is only possible when the source database is Microsoft SQL Server.
This technique is a flavor of change replication using high watermark (HWM). It uses SQL Server system functions CHANGETABLE
and CHANGE
, together with built-in HWM logic.
Read about other change replication techniques available in the Etlworks Integrator.
Setup
ALTER TABLE dbo.test_table_with_ct
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = OFF)
The source table must have the PRIMARY KEY
for CT to work.
Change Data Tracking Flow
Step 1. Create a Database to database
Flow.
Step 2. Select source FROM
and destination TO
databases and tables.
Step 3. Click MAPPING
and enter the following Source query
:
SELECT CT.id,
dbo.test_table_with_ct.*,
CT.SYS_CHANGE_VERSION debezium_cdc_timestamp,
CT.SYS_CHANGE_OPERATION debezium_cdc_op
FROM CHANGETABLE (CHANGES dbo.test_table_with_ct,{HIGH_WATERMARK}) as CT
LEFT JOIN dbo.test_table_with_ct
ON CT.id = dbo.test_table_with_ct.id
ORDER BY SYS_CHANGE_VERSION
Replace id
with the actual primary key columns.
The query uses SQL Server system functions CHANGETABLE
and CHANGES
, together with OUTER JOIN
(the OUTER JOIN
is needed to include deletes; otherwise, you can do JOIN
).
Notice the {HIGH_WATERMARK}
token. It will be replaced at runtime on the actual value of the HWM
which is set to SYS_CHANGE_VERSION
automatically incremented for each committed transaction.
Step 4. Select the Change Replication
tab, configure High Watermark Field
, and the logic to calculate the field.
High Watermark Field
:debezium_cdc_timestamp
Change High Watermark Field Value
:highWatermark == null ? 0 : java.lang.Math.round(highWatermark)
The code above sets the initial HWM
value to 0
and then returns the long value of highWatermark
(internally, it is stored as a decimal number).
Basically, on the first run, the {HIGH_WATERMARK}
is going to be set to 0
, and on all consecutive runs to the MAX(SYS_CHANGE_VERSION)
calculated for all extracted records.
Step 5. Select the Parameters
tab and configure how the Flow is going to be applying changes to the destination table.
Action
: one of the following:Record
: when theAction
is set toRecord
, the Flow will be looking for the value of the fielddebezium_cdc_op
to create and execute the appropriate SQL (I
-INSERT
,U
-UPDATE
,D
-DELETE
). It will also automatically remove columnsdebezium_cdc_op
anddebezium_cdc_timestamp
from the dataset so they will not appear in the destination table.Record with MERGE
: same asRecord
but eliminates duplicates.Record with IfExist
: same asMERGE
but slower. Use it only if the nativeMERGE
is not supported by the destination database or is not possible for the destination table.
Lookup Fields
: comma-separated list of fields that uniquely identify the record.
Alternatively, to set the Lookup Fields
, you can enable the Predict Lookup Fields
.
Step 6. Select Mappings
and exclude primary keys from the Mapping.
Excluding primary keys is needed because the Source query
SELECT CT.id, dbo.test_table_with_ct.*,
will populate two sets of key fields. The first one is from CHANGETABLE
and the second one is from the actual table. The second set of primary keys will have a suffix 1
after the name, for example, id1
.
Comments
0 comments
Please sign in to leave a comment.