- Startup
- Business
- Enterprise
- On-Premise
- Add-on
Overview
This article explains how to configure CDC source connections in Etlworks.
CDC (Change Data Capture) is a high-performance method of identifying and streaming row-level changes—INSERT, UPDATE, and DELETE—directly from the database transaction log.
Etlworks supports native log-based CDC for the following source databases:
-
Microsoft SQL Server
-
MySQL
-
Oracle
-
PostgreSQL
-
DB2
-
MongoDB
-
AS400 (IBM i)
Etlworks uses a deeply integrated, enhanced version of Debezium for its CDC engine. There is no need to install or manage Debezium directly.
To learn more about CDC pipelines, see: Change Data Capture (CDC) in Etlworks
1. Enabling CDC in the Source Database
Before creating a CDC flow, you must enable CDC on the source database.
Refer to the step-by-step guides for your database:
- Enable CDC for Microsoft SQL Server
- Enable CDC MySQL
- Enable CDC for Oracle
- Enable CDC for PostgreSQL
- Enable CDC for DB2
- Enable CDC for MongoDB
- Enable CDC for AS400 (IBMI platfroms)
2. Creating a CDC Connection
To create a new CDC Connection:
-
Go to Connections, click +, and search for cdc.
-
Select the source database type.
-
Enter connection parameters (host, port, credentials, etc.).
-
Click Test Connection.
Each connector includes common properties as well as additional configuration options specific to that database:
3. Connection Parameters (Overview)
Common Parameters
These parameters are supported by all CDC connectors:
-
Logical Connector Name – Unique ID used to track schema history and log position. Changing this will trigger a new snapshot.
-
Database Server Logical Name – Optional. Used for namespacing when running multiple connectors.
-
Include Tables – Comma-separated list of regex patterns to include.
-
Excluded Columns – Columns to exclude from emitted events.
-
CDC Key – Template used to generate filenames or Kafka topics. Can be overridden in flows.
Authentication
-
User / Password – Required database credentials.
Metadata (UI only)
-
Schema – Schema name for populating object lists in the UI.
-
Current DB and Schema Only – Filters object list to current database/schema.
Snapshot Options
-
Snapshot Mode – Determines when/if the connector captures an initial snapshot. Varies by database.
-
Ad-hoc Snapshots – Automatically trigger snapshots for new tables added later.
-
Snapshot Locking – Controls locking behavior and isolation level.
4. Serialization and Output
CDC events can be written to:
-
Files – CSV and/or JSON files, written to local or cloud storage.
-
Message Queues – Kafka, Azure Event Hubs, etc.
- NoSQL databases - MongoDB, etc.
- Directly into destination such as relational databases, cloud warehouses, etc.
You can configure:
-
File format and structure (CSV, JSON, event flattening)
-
Metadata preservation (column types, null handling, encoding)
-
Filename and partition key templates
-
Output location and storage credentials
Read more:
5. Offsets and Recovery
Each CDC connection maintains its offset and DDL history in local files:
-
Offset File – Tracks the last known transaction log position.
-
DDL History File – Stores schema changes for restart recovery.
You can customize:
-
File names and paths
-
Offset commit intervals
-
Recovery behavior on restart
6. Monitoring and Restart Behavior
CDC flows can run continuously or stop after a threshold:
-
Retry Count – Max retries before stopping
-
Retry Duration – Max time to retry
-
Run Duration Limit – Automatically stop after N minutes
You can also manually stop the flow from the UI.
7. Transactions and Metadata
Enable transaction markers and transaction-level metadata by setting:
-
Provide Transaction Metadata
-
Log Transaction Metadata
-
Log Transaction Start/End
This ensures downstream systems can group and process CDC events transactionally.
8. Error Handling and Auto-Restart
To improve reliability:
-
Max Retry Attempts – Limit retries on connection failure
-
Auto-stop on Error Match – Define specific error messages that will stop all connectors for a given host/port
9. Using SSH Tunnels
To connect to on-prem or firewalled databases:
-
Enable Connect over SSH tunnel
-
Configure:
-
SSH Host, Port
-
SSH User / Password
-
Private Key file (optional)
-
CDC File Output Settings
Serialization
The parameters in this group configure the format and structure of the CDC stream. These options are ignored when streaming to message queues.
-
Add Event Type and Unique Sequence: Adds two columns to the stream:
-
debezium_cdc_op – indicates the operation type (c for insert, u for update, d for delete).
-
debezium_cdc_timestamp – a globally unique, sequential number for each CDC event.
This setting is required when loading into cloud data warehouses such as Amazon Redshift or Snowflake.
-
-
Extra columns: Appends specified extra columns to the CDC stream. These can also be used to implement soft deletes.
-
Preprocessor: Optional JavaScript that modifies each event before it is written to the file.
-
event is a Jackson JsonNode representing the CDC event and is mutable.
-
key is the immutable CDC key.
-
-
Flatten CDC events: Flattens CDC events encoded in Extended JSON format. The connector decodes fields like $oid, $numberInt, $date, and transforms arrays to strings.
-
Convert JSON array to comma-separated list: Converts arrays such as [1,2,3] to 1,2,3.
-
Preserve square brackets: If enabled alongside the previous setting, keeps square brackets when converting arrays of objects:
[{"first": "joe"}, {"first": "jane"}] → [{"first": "joe"}, {"first": "jane"}].
Settings for Creating Files
This section controls how the connector generates CDC files.
-
Serialize CDC events as CSV files: Writes CDC events as CSV files. Filenames follow the pattern [cdc_key]_[uuid].csv, where uuid is a globally unique identifier.
-
Save Metadata: Creates an accompanying XML file containing column data types and primary key info. This metadata is used to correctly define destination table schemas.
-
Capture NOT NULL constraints: When enabled (requires Save Metadata), captures NOT NULL constraints for columns.
-
Columns in CSV File: Only the listed columns will be written. The list is case-insensitive. Columns such as extra fields, debezium_cdc_op, and debezium_cdc_timestamp are always included.
-
Serialize CDC events as JSON files: Writes CDC events as JSON files using the same naming convention as CSV files. This option can be enabled together with CSV output.
-
Use Unique Sequence as a suffix: If enabled, filenames include the sequential number instead of UUID: e.g., table_1234.csv.
-
Convert Geometry WKB to JSON: Converts Well-Known Binary (WKB) geometry data to human-readable JSON. Especially useful for PostgreSQL or other spatial databases.
File Rotation and Lifecycle
-
Max Rows in CSV file: Maximum number of rows per file. File is closed once the limit is reached.
-
Max File Size in Bytes: Maximum size per file. The file is closed when the size limit is hit.
-
Keep file open (ms): Time to keep the file open while waiting for additional updates.
-
Close CSV file if header is different: Closes the current file and opens a new one if column headers change for the same CDC key. Recommended when source schemas may change (e.g., MongoDB collections).
-
Close File After (ms): Maximum time to keep a file open, even if still receiving data.
-
Escape double-quotes: Escapes " characters by doubling them: " → "".
-
Convert null to empty: Converts SQL NULL to an empty string when writing to CSV.
-
Do not enclose null in double quotes: If enabled, nulls are written as blank (not "").
-
Convert boolean true/false to numeric 1/0: Useful for destinations expecting numeric booleans.
-
Remove EOL characters: Removes newlines or carriage returns from field values.
-
Encoding: Character encoding used for CSV file output.
Storage
These parameters configure where and how CDC files are stored.
IMPORTANT: All storage parameters can be overridden when creating a CDC flow (recommended).
-
Storage type: Select between local file system or cloud storage (S3 SDK, Azure Storage SDK, Google Cloud Storage).
-
Archive files: Optionally compresses files using the selected archive format before storing (ignored for file system).
-
Location of CDC events: Output directory for generated files. Use tokens such as {app.data} or CDC key parts ([db]/[schema]/[table]). Required when using cloud storage.
-
AWS Region: Required for S3.
-
Azure Authentication Type: Select between Access Key and SAS token.
-
Access Key / IAM Role / Account Name:
-
S3: Access Key ID or IAM Role
-
Azure: Storage Account Name
-
Google: Access Key
-
-
Secret Key / SAS Token:
-
S3: Secret Access Key (or empty if using IAM Role)
-
Azure: Access Key
-
Google: Secret
-
Settings for Streaming to Message Queues and NoSQL databases
The parameters in this group are used to configure the CDC connector to stream CDC events to a message queue, such as Kafka, Azure Event Hubs, or similar and NoSQL databases such as MongoDB.
These parameters are ignored when the connector is configured to create files.
Structure of the CDC Event:
A typical CDC event emitted by Debezium contains the following elements:
-
before: (optional) the state of the row before the change.
-
after: (optional) the state of the row after the change.
-
source: metadata about the source database and table.
-
op: the operation type (c for insert, u for update, d for delete).
-
ts_ms: (optional) the timestamp (in milliseconds) when the connector processed the event.
-
transaction: transaction metadata.
-
schema: the schema describing the structure of the payload.
-
fields: the JDBC-compatible schema.
By default, the connector converts only the after state of the event into flat JSON and sends it to the queue. However, you can modify the structure of the streamed event by including additional parts of the original Debezium event. This behavior is ignored when the connector is writing files.
Update fields for each event:
If enabled, the connector refreshes the list of fields for every CDC event, even if the CDC key remains unchanged. This setting is recommended when streaming data from multiple tables into a single topic.
Partition Key:
Used to determine the target partition when streaming events to Kafka or Azure Event Hubs. If the Partition Key is not set, events will be sent to random partitions. It is strongly recommended to configure this value.
For example, to group events by table, set the Partition Key to:
[table]
The same connector-specific tokens available for the CDC Key can also be used for the Partition Key.
Database-specific parameters
This section lists the configuration parameters for each supported CDC source: SQL Server, MySQL, Oracle, PostgreSQL, DB2, MongoDB, and AS400.
SQL Server
SQL Server CDC Connector Parameters
Host: the host or IP address of the source database.
Port: the port the source database is listening to.
Database: the name of the database to connect on startup.
Include Databases(s): an optional, comma-separated list of database names or regular expressions that match the names of the databases for which to capture changes. The connector does not capture changes in any database whose name is not in. If this parameter is empty the connector will use the value of the parameter Database. Setting the value of this property to more than one database will make it impossible to add tables to snapshot on the fly (ad-hoc snapshots).
Include Table(s): a comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be monitored; any table not included in the whitelist will be excluded from monitoring. Each identifier is of the form databaseName.tableName. By default, the connector will monitor every non-system table in each monitored database. This parameter will be overridden in the TO field of the CDC Flow.
Excluded Columns(s): an optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form databaseName.tableName.columnName, or databaseName.schemaName.tableName.columnName.
CDC Key: a template for CDC key. The CDC key is used as a file name when streaming CDC events and creating files and as a topic name when streaming CDC events to a message queue such as Kafka.
Example of the CDC key: testDB_dbo_customers_cdc_stream.
Available tokens:
-
[db]: database name
-
[schema]: schema name
-
[table]: table name
-
[op]: the CDC operation: c (create), d (delete), u (update)
-
[timestamp]: the timestamp of the event in ms
-
[version]: the connector version
-
[connector]: the connector class name
-
[name]: the connector name
-
[ts_ms]: the timestamp of the event in ms
-
[snapshot]: true for a snapshot
-
[change_lsn]: change lsn
-
[commit_lsn]: commit lsn
-
[event_serial_no]: event serial number
JDBC Fetch Size: the JDBC Fetch Size field can be used to set the maximum number of records which will be retrieved in one database call when fetching the data in a snapshot mode, thus limiting the memory consumption within the JVM. The default is 10000.
Max Rows to Read: the maximum number of rows to read when fetching the data in a snapshot mode. The default is empty, hence no limit.
Max Field Size: sets the limit for the maximum number of bytes that can be returned for character and binary column values in a ResultSet object produced by this Statement object. The default is empty, hence there is no limit.
SQL Server Snapshot
Snapshot Mode: a mode for taking an initial snapshot of the structure and optional data of captured tables. Supported values are:
-
ad-hoc initial: the connector takes a snapshot of the structure and data of captured tables, then starts streaming change events from the captured tables. This mode allows adding new tables and triggering incremental snapshots at runtime.
-
ad-hoc schema only: the connector takes a snapshot of the structure of captured tables only, then starts streaming change events from the captured tables. This mode allows adding new tables and triggering incremental snapshots at runtime.
-
initial: the connector takes a snapshot of the structure and data of captured tables, then starts streaming change events from the captured tables. It is useful if a complete representation of the data from the included tables should be captured.
-
initial_schema_only: the connector takes a snapshot of the structure of captured tables only. It is useful if only changes happening from now onwards should be propagated downstream. Once the snapshot is complete, the connector will continue reading change events from the database’s redo logs.
-
initial only: the connector performs a database snapshot and stops before streaming any change event records.
Snapshot Isolation Mode: mode to control which transaction isolation level is used and how long the connector locks the monitored tables. There are five possible values:
-
read_uncommitted
-
read_committed
-
repeatable_read
-
snapshot
-
exclusive (in fact, exclusive mode uses repeatable_read isolation level, however, it takes the exclusive lock on all tables to be read)
It is worth documenting that snapshot, read_committed, and read_uncommitted modes do not prevent other transactions from updating table rows during the initial snapshot, while exclusive and repeatable_read do.
Another aspect is data consistency. Only exclusive and snapshot modes guarantee full consistency — that is, initial snapshot and streaming logs constitute a linear history.
In the case of repeatable_read and read_committed modes, it might happen that, for instance, a record added appears twice — once in the initial snapshot and once in the streaming phase.
For read_uncommitted there are no data consistency guarantees at all (some data might be lost or corrupted).
Debezium SQL Server Properties
Read about all properties available for Debezium SQL Server connector. The properties can be set as key-value pairs in Other Parameters.
MySQL
MySQL CDC connector parameters
Host: the host or IP address of the source database.
Port: the port the source database is listening to.
Database: the name of the database to connect on startup.
Include Database(s): optional comma-separated list of regular expressions that match database names to be monitored; any database name not included in the whitelist will be excluded from monitoring. By default, all databases will be monitored.
Include Table(s): a comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be monitored; any table not included in the whitelist will be excluded from monitoring. Each identifier is of the form databaseName.tableName. By default, the connector will monitor every non-system table in each monitored database. This parameter will be overridden in the TO field of the CDC Flow.
Excluded Column(s): an optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form databaseName.tableName.columnName, or databaseName.schemaName.tableName.columnName.
Case sensitive matching: by default, the system will use the case-insensitive pattern to whitelist the db.table to stream. For example, if there is a database test and the tables abc and Abc, the stream will include records from test.abc and test.Abc. By enabling this option you can enforce the case-sensitive matching of the db.table defined for the source.
CDC Key: a template for CDC key. The CDC key is used as a file name when streaming CDC events and creating files and as a topic name when streaming CDC events to a message queue such as Kafka.
Example of the CDC key:
inventory_customers_cdc_stream
Available tokens:
-
[db]: database name
-
[table]: table name
-
[op]: CDC operation: c (create), d (delete), u (update)
-
[timestamp]: the timestamp of the event in ms
-
[version]: the connector version
-
[connector]: the connector class name
-
[name]: the connector name
-
[ts_ms]: the timestamp of the event in ms
-
[snapshot]: true for a snapshot
-
[server_id]: the MySQL server ID
-
[gtid]: the GTID
-
[file]: the binlog filename
-
[pos]: the position in the binlog
-
[row]: the row number
JDBC Fetch Size: sets the maximum number of records that will be retrieved in one database call when fetching the data in a snapshot mode. Helps limit memory consumption within the JVM. Default and recommend: empty.
Max Rows to Read: the maximum number of rows to read during snapshot. Default is empty (no limit).
Max Field Size: sets the limit for the maximum number of bytes that can be returned for character and binary column values in a ResultSet. Default is empty (no limit).
MySQL snapshot
Snapshot Mode: specifies the criteria for running a snapshot upon connector startup. Supported values:
-
ad-hoc initial: runs a snapshot upon startup when needed, then starts streaming from binlog. Allows adding new tables and triggering incremental snapshots.
-
ad-hoc schema only: same as above, but captures only table structures. Allows adding new tables.
-
initial: snapshot is taken only if no offsets exist, then streams from binlog.
-
initial_only: snapshot is taken once, and the connector stops without streaming.
-
schema_only: does not capture data, only structure. Starts streaming from now.
-
when_needed: similar to ad-hoc initial, but more deterministic.
-
schema_only_recovery: used to recover corrupted/lost database history or to clean it up periodically.
-
always_recover_schema: like when_needed but always refreshes schema on startup.
-
schema_only_always_recover_schema: like schema_only but always refreshes schema on startup.
Snapshot Locking Mode: controls how the global MySQL read lock is handled during snapshot:
-
minimal: holds lock briefly to read schemas and metadata.
-
extended: holds lock for the full snapshot duration.
-
none: does not acquire any global locks (safe only if schema is stable). Note: MyISAM tables will still be locked.
Starting position after snapshot
Starting binlog filename for Snapshot: optional binlog filename to start reading from after snapshot. Used only with initial and schema_only snapshot modes.
Starting binlog position for Snapshot: optional position in binlog file to start reading from. Used only with initial and schema_only snapshot modes.
Starting GTID: optional GTID to start reading from. Used with initial, when_needed, and schema_only.
Debezium MySQL properties
Read about all properties available for Debezium MySQL connector. The properties can be set as key-value pairs in Other Parameters.
PostgreSQL
Postgres CDC connector parameters
Host: the host or IP address of the source database.
Port: the port the source database is listening to.
Database: the name of the database to connect on startup.
Include Schemas(s): an optional, comma-separated list of regular expressions that match names of schemas for which you want to capture changes. Any schema name not included in schema.include.list is excluded from having its changes captured. By default, all non-system schemas have their changes captured. Do not also set the schema.exclude.list property.
Include Table(s): a comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be monitored; any table not included in the whitelist will be excluded from monitoring. Each identifier is of the form databaseName.tableName. By default, the connector will monitor every non-system table in each monitored database. This parameter will be overridden in the TO field of the CDC Flow.
Excluded Columns(s): an optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form databaseName.tableName.columnName or databaseName.schemaName.tableName.columnName.
Server: logical name that identifies and provides a namespace for the particular PostgreSQL database server/cluster being monitored. The logical name should be unique across all other connectors, since it is used as a prefix for all Kafka topic names coming from this connector. Defaults to host:port/dbname, though we recommend using an explicit and meaningful logical name.
Output Plugin: PostgreSQL’s logical decoding feature was first introduced in version 9.4 and is a mechanism that allows the extraction of the changes which were committed to the transaction log and the processing of these changes in a user-friendly manner via the help of an output plugin. This output plugin must be installed prior to running the PostgreSQL server and enabled together with a replication slot in order for clients to be able to consume the changes.
Replication Slot Name: the name of the PostgreSQL logical decoding slot that was created for streaming changes from a particular plug-in for a particular database/schema. The server uses this slot to stream events to the connector that you are configuring.
Append Included Tables to Replication Slot Name: if this option is enabled the system will append the ‘included tables’ to the replication slot name. This will create a new replication slot if the list has changed, therefore triggering the new snapshot for all included tables.
Propagate default values: if this option is enabled and a default value is specified for a column in the database schema, the PostgreSQL connector will attempt to propagate this value to the stored schema whenever possible.
CDC Key: a template for CDC key. The CDC key is used as a file name when streaming CDC events and creating files and as a topic name when streaming CDC events to a message queue such as Kafka. Example of the CDC key: postgres_public_customers_cdc_stream.
Available tokens:
-
[db]: database name
-
[schema]: schema name
-
[table]: table name
-
[op]: the CDC operation: c (create), d (delete), u (update)
-
[timestamp]: the timestamp of the event in ms
-
[version]: the connector version
-
[connector]: the connector class name
-
[name]: the connector name
-
[ts_ms]: the timestamp of the event in ms
-
[snapshot]: true for a snapshot
-
[txId]: the transaction ID
-
[lsn]: the lsn
-
[xmin]: the xmin
JDBC Fetch Size: the JDBC Fetch Size field can be used to set the maximum number of records which will be retrieved in one database call when fetching the data in a snapshot mode, thus limiting the memory consumption within the JVM. The default is 10000.
Max Rows to Read: the maximum number of rows to read when fetching the data in a snapshot mode. The default is empty, hence there is no limit.
Max Field Size: sets the limit for the maximum number of bytes that can be returned for character and binary column values in a ResultSet object produced by this Statement object. The default is empty, hence there is no limit.
Postgres snapshot
Snapshot Mode: specifies the criteria for running a snapshot upon startup of the connector.
-
ad-hoc initial: the connector performs an initial snapshot only when no offsets have been recorded for the logical server name. This mode allows adding new tables and triggering incremental snapshots at runtime.
-
ad-hoc: the connector never runs a snapshot and upon the first startup with a logical server name the connector reads data either from where it last left off (last LSN position) or starts from the beginning from the point of the view of the logical replication slot. This mode allows adding new tables and triggering incremental snapshots at runtime.
-
initial: the connector performs a snapshot only when no offsets have been recorded for the logical server name.
-
initial_only: the connector only takes an initial snapshot and then stops, without processing any subsequent changes.
-
never: the connector never runs a snapshot and upon the first startup with a logical server name the connector reads data either from where it last left off (last LSN position) or starts from the beginning from the point of the view of the logical replication slot.
-
always: the connector runs a snapshot each time on startup.
-
exported: deprecated and equals to initial.
-
custom: this snapshot mode lets you inject your own implementation of the io.debezium.connector.postgresql.spi.Snapshotter interface.
Debezium PostgreSQL properties
Read about all properties available for Debezium PostgreSQL connector. The properties can be set as key-value pairs in Other Parameters.
Oracle
Oracle Connection Parameters
-
Host: The host or IP address of the source database.
-
Port: The port that the source database is listening to.
-
Database: The name of the database to connect on startup. Must be the CDB name when working with the CDB + PDB model.
-
Name of the PDB: Name of the PDB to connect to, when working with the CDB + PDB model.
-
Database URL: The raw database JDBC URL. This property can be used when more flexibility is needed and can support raw TNS names or RAC connection strings.
Oracle Table and Column Filtering
-
Include Table(s): A comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be monitored. Any table not included in the whitelist will be excluded from monitoring. Each identifier is of the form databaseName.tableName. By default, the connector monitors every non-system table in each monitored database. This parameter will be overridden in the TO field of the CDC Flow.
-
Excluded Column(s): An optional comma-separated list of regular expressions that match fully-qualified column names to be excluded from change event message values. Format: databaseName.tableName.columnName or databaseName.schemaName.tableName.columnName.
Oracle CDC Key
-
CDC Key: Template for generating the CDC key used as a file name or topic name when streaming change events.
Example: server1_DEBEZIUM_CUSTOMERS_Envelope_cdc_stream.
Available tokens:
-
[source]: Fully qualified event source.
-
[op]: Operation (c = create, d = delete, u = update).
-
[timestamp]: Event timestamp in milliseconds.
-
[version]: Connector version.
-
[connector]: Connector class name.
-
[name]: Connector name.
-
[ts_ms]: Timestamp in milliseconds.
-
[snapshot]: true for snapshot events.
-
[txId]: Transaction ID.
-
[scn]: SCN (System Change Number).
-
Adapter and Mode Settings
-
CDC Adapter: The adapter implementation to use:
-
xstream: Uses Oracle XStreams API.
-
logminer (default): Uses native Oracle LogMiner API.
-
olr: Uses OpenLogReplicator.
-
-
Run connector in read-only mode: If enabled (and CDC Adapter is set to logminer), the connector will not flush the LGWR buffer to disk, allowing connections to read-only databases.
-
Name of the XStream outbound server: Required only when CDC Adapter is set to xstream. Ignored otherwise.
-
OpenLogReplicator source: Logical name of the source.name element in the OpenLogReplicator JSON configuration.
-
OpenLogReplicator host: Hostname or IP address of the OpenLogReplicator network service.
-
OpenLogReplicator port: Port number of the OpenLogReplicator network service.
Oracle JDBC Settings
-
JDBC Fetch Size: Maximum number of records retrieved in a single database call during snapshot mode. Default is 10000.
-
Max Rows to Read: Maximum number of rows to read during snapshot mode. Leave empty for no limit.
-
Max Field Size: Maximum number of bytes returned for character or binary columns in snapshot mode. Leave empty for no limit.
Oracle Snapshot
-
Snapshot Mode: Defines how the connector performs the initial snapshot. Supported values:
-
ad-hoc initial: Snapshot structure + data, then stream from redo logs. Allows ad-hoc snapshots at runtime.
-
ad-hoc schema only: Snapshot structure only, then stream from redo logs. Allows ad-hoc snapshots.
-
initial: Snapshot structure + data once, then stream changes.
-
schema_only: Snapshot structure only, then stream changes.
-
schema_only_recovery: Rebuilds database history if lost/corrupted.
-
always_recover_schema: Like initial but always refreshes schema on startup.
-
schema_only_always_recover_schema: Like schema_only but always refreshes schema on startup.
-
-
Disable Flashback Query (AS OF) when creating snapshot:
By default, Oracle CDC uses AS OF SCN queries (Flashback Query) to read table data as of a consistent point in time. This can be slow in some environments. Enabling this option disables flashback queries to speed up snapshot processing.
Embedded Infinispan Cache for Log Mining Buffer
By default, Oracle CDC stores change events temporarily in memory while determining transaction boundaries. This can cause OutOfMemoryErrors for large or long-running transactions.
Enable the Embedded Infinispan Cache for Log Mining Buffer option to use a persistent, embedded cache store instead of heap memory.
Benefits:
-
Lower memory footprint.
-
Higher stability in high-throughput or long-transaction scenarios.
-
Works in embedded mode or with external Infinispan servers.
Debezium Oracle Properties
Read more about all supported Debezium Oracle connector properties. These can be set as key-value pairs under Other Parameters in the connector configuration.
DB2
DB2 Connection Parameters
-
Host: The host or IP address of the source database.
-
Port: The port the source database is listening to.
-
Database: The name of the database to connect on startup.
DB2 Table and Column Filtering
-
Include Table(s): A comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be monitored. Any table not included in the whitelist will be excluded from monitoring. Each identifier is of the form databaseName.tableName.
By default, the connector monitors every non-system table in each monitored database. This parameter is overridden in the TO field of the CDC Flow.
-
Excluded Column(s): An optional comma-separated list of regular expressions that match fully-qualified column names to exclude from change event message values. Format: databaseName.tableName.columnName or databaseName.schemaName.tableName.columnName.
DC2 CDC Key
-
CDC Key: Template for generating the CDC key used as a file name or topic name when streaming CDC events.
Example: testDB_testSchema_customers_cdc_stream.
Available tokens:
-
[db]: Database name
-
[schema]: Schema name
-
[table]: Table name
-
[op]: Operation type (create/update/delete)
-
[timestamp]: Event timestamp in milliseconds
-
[version]: Connector version
-
[connector]: Connector class name
-
[name]: Connector name
-
[ts_ms]: Event timestamp in milliseconds
-
[snapshot]: true if the event is from a snapshot
-
[change_lsn]: Change LSN
-
[commit_lsn]: Commit LSN
-
DB2 JDBC Settings
-
JDBC Fetch Size: Maximum number of records retrieved in one database call during snapshot mode. Default: 10000.
-
Max Rows to Read: Maximum number of rows to read during snapshot mode. Default: empty (no limit).
-
Max Field Size: Maximum number of bytes returned for character and binary columns. Default: empty (no limit).
DB2 Snapshot
-
Snapshot Mode: Defines how the connector performs the initial snapshot. Supported values:
-
ad-hoc initial: Takes a snapshot of schema and data for tables in capture mode, then starts streaming changes. Supports adding new tables and triggering snapshots at runtime.
-
ad-hoc schema only: Takes a snapshot of schema only, then starts streaming changes. Also supports adding new tables and triggering snapshots at runtime.
-
initial: Takes a snapshot of schema and data for tables in capture mode.
-
schema_only: Takes a snapshot of schema only, then begins reading changes. Useful if only new changes should be emitted.
-
-
Snapshot Isolation Mode: Controls the isolation level and lock behavior during the snapshot:
-
read_uncommitted: No locking. No consistency guarantee—data may be lost or corrupted.
-
read_committed: Allows updates during snapshot. New records may appear twice. Suitable for mirroring.
-
repeatable_read: Prevents updates during snapshot. New records may still appear twice. Suitable for mirroring.
-
exclusive: Uses repeatable read plus exclusive locks. Prevents any updates during snapshot. Only mode that guarantees full consistency.
-
Debezium DB2 Properties
Read about all properties available for Debezium DB2 connector. The properties can be set as key-value pairs in Other Parameters.
MongoDB
MongoDB Connection Parameters
-
Cluster URL: The required MongoDB connection string.
Refer to the MongoDB Connection String documentation for details.
Capture Mode
-
Capture Modes: Defines how the connector captures changes from MongoDB. Supported values:
-
change_streams_update_full: Uses MongoDB Change Streams. Update messages contain the full document. This is the default.
-
change_streams: Uses MongoDB Change Streams. Update messages do not contain the full document.
-
oplog: Captures changes from MongoDB oplog. Deprecated.
-
Filtering
-
Include Database(s): Optional comma-separated list of regular expressions matching database names to monitor.
Any database not matched is excluded. By default, all databases are monitored.
-
Include Collection(s): Optional comma-separated list of regular expressions that match collection names to be excluded.
Any collection not matched is included. This acts as a blacklist.
MongoDB CDC Key
-
CDC Key: Template for generating the CDC key used as a file name or topic name when streaming events.
Example: inventory_customers_cdc_stream
Available tokens:
-
[db]: Database name
-
[collection]: Collection name
-
[op]: CDC operation: c (create), d (delete), u (update)
-
[timestamp]: Event timestamp in milliseconds
-
[version]: Connector version
-
[connector]: Connector class name
-
[name]: Connector name
-
[ts_ms]: Event timestamp in milliseconds
-
[snapshot]: true for a snapshot
-
[order]: Order of the event
-
[h]: h value from the event
-
MongoDB Snapshot Mode
-
Snapshot Mode: Defines how and when the connector takes a snapshot of MongoDB collections on startup. Supported values:
-
ad-hoc schema only: Snapshots only the schema for collections in capture mode, then starts streaming changes. Supports runtime ad-hoc snapshots of new collections.
-
initial: Runs a snapshot if no offset is found, then streams change events.
-
ad-hoc initial: Same as initial, but supports ad-hoc snapshots of new collections at startup or on demand.
-
never: Skips snapshot and immediately starts streaming changes.
-
ad-hoc schema only: Same as never, but supports adding collections and triggering snapshots at runtime.
-
initial_only: Runs a snapshot if no offset exists, then stops. Does not stream changes after snapshot.
-
Additional Options
-
New name of the Object ID (_id) field:
MongoDB documents have an _id field, often a nested object like ObjectId("...").
The CDC Flow flattens this into a unique string and renames the field.
-
Number of Documents in Explorer:
Maximum number of MongoDB documents displayed in Etlworks Explorer.
Default: 10
Maximum: 9999
MongoDB collections may contain millions of documents—rendering too many can overload the server.
Debezium MongoDB Properties
Read about all available Debezium MongoDB connector properties. These can be set as key-value pairs in Other Parameters.
AS/400
AS/400 Connection Parameters
-
Host: The host or IP address of the source database.
-
Port: The port the source database is listening on.
-
Database: The name of the database to connect to on startup.
AS/400 Table and Column Filters
-
Include Table(s): Comma-separated list of regular expressions matching fully qualified table identifiers (databaseName.tableName) for tables to be monitored.
Any table not included will be excluded.
This parameter is overridden by the TO field in the CDC Flow.
-
Excluded Column(s): Optional comma-separated list of regular expressions matching fully qualified column names to exclude from change event messages.
Fully qualified names should follow databaseName.tableName.columnName or databaseName.schemaName.tableName.columnName.
AS/400 CDC Key
-
CDC Key: Template for CDC key, used as a file name or topic name for streaming events.
Example: testDB_testSchema_customers_cdc_stream
Available tokens:
-
[db]: Database name
-
[schema]: Schema name
-
[table]: Table name
-
[op], [timestamp], [version], [connector], [name], [ts_ms], [snapshot], [change_lsn], [commit_lsn]
-
AS/400 Snapshot Parameters
-
Snapshot Mode: Defines when and how a snapshot is taken at connector startup:
-
initial: Snapshots schema and data for all captured tables, then streams changes.
-
schema_only: Snapshots only the schema for captured tables, then streams changes.
-
never: Skips snapshot entirely and begins streaming immediately.
-
-
JDBC Fetch Size: Maximum number of rows fetched in a single query during snapshot mode.
Helps limit JVM memory usage. Default: 10000.
-
Max Rows to Read: Maximum number of rows to read during snapshot.
Default is empty (no limit).
-
Max Field Size: Sets limit for number of bytes returned for character and binary column values.
Default is empty (no limit).
AS/400-Specific Parameters
-
Size of Buffer: Size of the buffer used to fetch journal entries.
Default: 131072. Should not be smaller.
-
Maximum Number of Journal Entries to Process Server Side:
Limits the number of journal entries processed during filtering.
Default: 1000000.
-
AS400 Date Format: Default format uses 2-digit years in range 1940–2039.
Set this to iso for proper date parsing.
If dates fall outside the default range and format is not adjusted, performance will degrade and data may be missing.
-
Error Reporting: Controls JDBC-level error detail in the logs.
Other Parameters
These properties can be set as key-value pairs under Other Parameters.
-
socket timeout: Timeout in milliseconds for the socket connection.
Default: 300000.
-
keep alive: Enables TCP keep-alive for the connection.
Default: true.
-
thread used: Controls whether a separate processing thread is used.
Default: false.
-
time.precision.mode: Controls how timestamp precision is handled.
Default: adaptive (milliseconds).
Recommended: adaptive_time_microseconds (microseconds).
-
alternative.temporal.decoder: If set to true, the connector stops using the AS400 JDBC driver for decoding dates and timestamps and switches to internal decoding.
Enable this option if you expect to store ancient dates such as 0001-01-01.
Default: false.
-
dedup.by.sequence: Enables built-in deduplication by sequence number + receiver + receiver library when processing journal entries.
Built-in deduplication works only while the flow is running. It prevents processing journal entries with the same sequence number + receiver + receiver library more than once during execution.
Default: false.
-
dedup.cache.size: Defines the in-memory deduplication cache size.
Default: 1000. This is sufficient for most workloads.
Recommend parameters for working with date/time columns
This ensures date/time parsing and precision is preserved end to end:
Step 1. Set time.precision.mode in Other Parameters to adaptive_time_microseconds
Step 2. Set alternative.temporal.decoder in Other Parameters to true.
Step 2. Set the following date/time formats under Formats and Templates
- Date and Time Format: yyyy-MM-dd HH:mm:ss.SSSSSS
- Date Format: yyyy-MM-dd
- Time Format: HH:mm:ss.SSSSSS
- Timezone: UTC. Setting timezone to UTC prevents timezone drift based on timezone of JVM running Etlworks.
Where to go next
- Reload tables, add new ones, and manage incremental or partial snapshots.
- Setup tips, edge cases, and snapshot behavior for supported databases.
- Learn how to configure and monitor CDC flows.
- Real-world guidance on encoding, NULL handling, schemas, and troubleshooting.