Change Data Capture (CDC) from transaction log – Etlworks Support

Startup
Business
Enterprise
On-Premise
Add-on

Overview

Change Data Capture (CDC) enables real-time replication of changes—INSERT, UPDATE, and DELETE—from source databases. In Etlworks, CDC pipelines are fully managed, prebuilt, and require no external software or configuration.

Each CDC pipeline begins with an initial snapshot of the source tables. Once complete, the flow automatically switches to streaming mode, capturing changes from the transaction log as they happen.

This article provides a high-level overview of how CDC works in Etlworks, supported topologies, and next steps based on your use case.

Learn about other change replication techniques available in Etlworks.

About the CDC Engine in Etlworks

Etlworks includes a built-in, deeply integrated CDC engine based on a customized version of Debezium.

There is nothing to install or manage separately—the CDC engine runs as part of the flow execution. It has been significantly enhanced to:

Support additional databases and formats
Improve performance and stability
Eliminate operational overhead

Although based on Debezium, the CDC engine in Etlworks is tightly coupled with the platform and includes features not available in standalone deployments.

Supported CDC topologies

CDC in Etlworks supports all modern deployment scenarios:

Deployment Model	Details and Links
Cloud	Secure access to on-prem databases via SSH tunnels→ Using SSH tunnels
Hybrid-cloud	Integration Agents installed in private networks→ About Integration Agents
Fully on-premise	No cloud access required→ On-prem deployment options

Getting Started with CDC in Etlworks

Each CDC pipeline follows a consistent pattern:

Initial Snapshot

Captures a consistent point-in-time copy of all included tables.
Streaming Mode

Captures real-time changes (INSERT, UPDATE, DELETE) as they are committed.

Enable CDC in the Source Database

Before creating CDC flows, you must enable CDC in the source database:

CDC Connectors

Learn about all available source connectors and their parameters in the article: Connectors for Change Data Capture (CDC)

CDC Flow Types

Once CDC is enabled on the source, Etlworks provides prebuilt, directional flows tailored for each destination type.

Destination Type	Recommended Flow Type(s)
File Storage (CSV, JSON)	Stream CDC events, Create Files
Message Queue (Kafka, etc.)	Stream CDC events into Message Queue
NoSQL database (MongoDB, etc.)	Stream CDC events into NoSQL database
Relation database	Stream CDC events into Relational Databases Stream and Bulk Load CDC events into Database
Snowflake	Stream CDC events into Snowflake
Amazon Redshift	Stream CDC events into Redshift
Google BigQuery	Stream CDC events into BigQuery
Azure Synapse Analytics and Azure Fabric Warehouse	Stream CDC events into Synapse Analytics of Fabric Warehouse
Databricks	Stream CDC events into Databricks
Vertica	Stream CDC events into Vertica
Greenplum	Stream CDC events into Greenplum
Web Service (API)	Stream CDC Events into Web Service

More CDC Resources

Snapshot Management

Reload tables, add new ones, and manage incremental or partial snapshots.

Read about Snapshot Management.

Database-Specific CDC Cases

Setup tips, edge cases, and snapshot behavior for supported databases.

Read about Database-Specific CDC Cases

CDC Configuration and Monitoring

Learn how to configure and monitor CDC flows.

Tips and Tricks for CDC Flows

Real-world guidance on encoding, NULL handling, schemas, and troubleshooting.

Tips and Tricks for CDC Flows

CDC Settings Reference

Per-setting reference covering what each CDC connection field does, what the safe default is, when to change it, and how it interacts with other settings or downstream destinations. The same page maps the automated Flow Findings inspector results to their full explanations.

Read about CDC settings reference: tradeoffs and dependencies.

Edit CDC Offset Files

An in-UI editor for the offset file Etlworks uses to track the CDC stream's position. Used for advanced troubleshooting — for example, when the recorded position points at data the source can no longer serve, or when source-side recovery has changed where the connector should resume from. Includes automatic backup and JSON validation.

Read about Edit CDC Offset Files.

Recovery and Resumability

How CDC flows resume automatically from the last committed position, and the small set of edge cases — corrupted or deleted offset files, unavailable storage, a read-only volume, a position the connector cannot process — that require manual intervention.

Read about Recovery and Resumability.

Real-World Use Cases

Intertek Alchemy

Streamed CDC events from 1,500 MySQL databases into Snowflake in real time

→ Read the case study
Ambyint

CDC from MongoDB and IoT (MQTT brokers) into Snowflake

→ Read the case study
Tallink

Real-time CDC and batch ETL from MySQL, Oracle, APIs, and files

→ Read the case study

Common Configuration Mistakes

A small number of CDC connection settings are responsible for the majority of CDC support cases. The Etlworks Flow Findings inspector now flags these as issues against any flow you create or edit, and the CDC settings reference article explains what each one does and how to fix it. The most common are:

FROM configured as a wildcard or regex instead of an explicit list of tables — this disables ad-hoc snapshots and can trigger an unplanned full re-snapshot after a restart or disaster recovery event. See FROM — list of tables to capture.
“Add Event Type and Unique Sequence” disabled on a flow whose destination performs deletes — with the option off, the destination cannot distinguish deletes from updates, and delete events are silently dropped. See Add Event Type and Unique Sequence.
Soft deletes configured on a flow whose destination performs deletes — soft deletes rewrite delete events as updates, so the destination's delete behavior has nothing to act on. See Extra columns and soft deletes.
Offset or history files outside {app.data}/debezium_data — files placed outside that directory are not backed up, so the CDC stream cannot be recovered if they are lost. See Offset File Name and DDL History File Name.
Signal Data Collection set without a Signal connection — this enables an incremental snapshot in the source database itself, which requires write access and can cause WAL retention growth on PostgreSQL. See Signal Data Collection.

The settings reference covers the full list. If you are setting up or troubleshooting a CDC flow, run the inspector against it from the Inspect flow button on the flow toolbar — the findings will point at whichever of the above (or other) issues apply.

Articles in this section

Overview

About the CDC Engine in Etlworks

Supported CDC topologies

Getting Started with CDC in Etlworks

Enable CDC in the Source Database

CDC Connectors

CDC Flow Types

More CDC Resources

Snapshot Management

Database-Specific CDC Cases

CDC Configuration and Monitoring

Tips and Tricks for CDC Flows

CDC Settings Reference

Edit CDC Offset Files

Recovery and Resumability

Real-World Use Cases

Common Configuration Mistakes

Related articles