- Startup
- Business
- Enterprise
- On-Premise
- Add-on
Overview
Change Data Capture (CDC) enables real-time replication of changes—INSERT, UPDATE, and DELETE—from source databases. In Etlworks, CDC pipelines are fully managed, prebuilt, and require no external software or configuration.
Each CDC pipeline begins with an initial snapshot of the source tables. Once complete, the flow automatically switches to streaming mode, capturing changes from the transaction log as they happen.
This article provides a high-level overview of how CDC works in Etlworks, supported topologies, and next steps based on your use case.
Learn about other change replication techniques available in Etlworks.
About the CDC Engine in Etlworks
Etlworks includes a built-in, deeply integrated CDC engine based on a customized version of Debezium.
There is nothing to install or manage separately—the CDC engine runs as part of the flow execution. It has been significantly enhanced to:
- Support additional databases and formats
- Improve performance and stability
- Eliminate operational overhead
Although based on Debezium, the CDC engine in Etlworks is tightly coupled with the platform and includes features not available in standalone deployments.
Supported CDC topologies
CDC in Etlworks supports all modern deployment scenarios:
| Deployment Model | Details and Links |
|---|---|
| Cloud | Secure access to on-prem databases via SSH tunnels→ Using SSH tunnels |
| Hybrid-cloud | Integration Agents installed in private networks→ About Integration Agents |
| Fully on-premise | No cloud access required→ On-prem deployment options |
Getting Started with CDC in Etlworks
Each CDC pipeline follows a consistent pattern:
-
Initial Snapshot
Captures a consistent point-in-time copy of all included tables.
-
Streaming Mode
Captures real-time changes (INSERT, UPDATE, DELETE) as they are committed.
Enable CDC in the Source Database
Before creating CDC flows, you must enable CDC in the source database:
- Enable CDC for Microsoft SQL Server
- Enable CDC MySQL
- Enable CDC for Oracle
- Enable CDC for PostgreSQL
- Enable CDC for DB2
- Enable CDC for MongoDB
- Enable CDC for AS400 (IBMI platfroms)
CDC Connectors
Learn about all available source connectors and their parameters in the article: Connectors for Change Data Capture (CDC)
CDC Flow Types
Once CDC is enabled on the source, Etlworks provides prebuilt, directional flows tailored for each destination type.
| Destination Type | Recommended Flow Type(s) |
| File Storage (CSV, JSON) | Stream CDC events, Create Files |
| Message Queue (Kafka, etc.) | Stream CDC events into Message Queue |
| NoSQL database (MongoDB, etc.) | Stream CDC events into NoSQL database |
| Relation database | |
| Snowflake | Stream CDC events into Snowflake |
| Amazon Redshift | Stream CDC events into Redshift |
| Google BigQuery | Stream CDC events into BigQuery |
| Azure Synapse Analytics and Azure Fabric Warehouse | Stream CDC events into Synapse Analytics of Fabric Warehouse |
| Databricks | Stream CDC events into Databricks |
| Vertica | Stream CDC events into Vertica |
| Greenplum | Stream CDC events into Greenplum |
| Web Service (API) | Stream CDC Events into Web Service |
More CDC Resources
Snapshot Management
Reload tables, add new ones, and manage incremental or partial snapshots.
Read about Snapshot Management.
Database-Specific CDC Cases
Setup tips, edge cases, and snapshot behavior for supported databases.
Read about Database-Specific CDC Cases
CDC Configuration and Monitoring
Learn how to configure and monitor CDC flows.
Tips and Tricks for CDC Flows
Real-world guidance on encoding, NULL handling, schemas, and troubleshooting.
Real-World Use Cases
-
Intertek Alchemy
Streamed CDC events from 1,500 MySQL databases into Snowflake in real time
→ Read the case study
-
Ambyint
CDC from MongoDB and IoT (MQTT brokers) into Snowflake
→ Read the case study
-
Tallink
Real-time CDC and batch ETL from MySQL, Oracle, APIs, and files
→ Read the case study