Change data capture (CDC) is an approach to data integration that is based on the identification, capture, and delivery of the changes made to the source database and stored in the database redo log (also called transaction log).
Etlworks supports native log-based CDC for PostgreSQL, SQL Server, MySQL, and Oracle. Read about other change replication techniques available in Etlworks Integrator.
Setup CDC for the source database
Enabling CDC is different for each database. Please use the following tutorials:
Create CDC flow
Step 1. Create a source CDC connection for the source database.
Step 2. Create a connection to the destination database. Disable the auto-commit.
Step 3. Create a new flow by selecting any of the CDC flows from the gallery.
Step 4. Add a new source-to-destination transformation where:
- the FROM connection is the CDC connection created in step 1.
- the FROM is a fully qualified source table name (test.dbo.patient) or a wildcard table name (test.dbo.*). If the wildcard table name is used the system will perform a CDC change replication for all tables in the source with a name that matches the wildcard pattern.
- the TO is a destination table name. If you have configured a wildcard table name in the from, configure the destination name in the to using either only asterisk character (
*) or any part of the destination name with the asterisk character (
public.*). The system will substitute the asterisk character with the actual source object name (excluding database and schema names). For example, if the actual source name is
test.dbo.patientand the destination is
public.*the system will load data into the
- the TO connection is the destination database connection created in step 2.
Step 5. Click the
MAPPING button and select the Parameters tab. Set Action to
Record and Lookup fields to the comma-separated list of fields which uniqule identify the record in the source database.
Typically, the CDC flow will perform a full load in the first run and will automatically switch to the CDC mode for all consecutive runs.
You can enable option Switch to Snapshot if Error when configuring the CDC connection.
If an error occurred and this option is enabled the system will automatically switch to the snapshot mode and attempt to reload the whole table in the next run. Use it with caution.