Message queue integration – Etlworks Support

Starter
Business
Enterprise
On-Premise
Add-on

What can you do with messages queues in Etlworks

Read and process messages from the queue In this scenario, the messages are extracted from the queue in micro-batches, transformed, and loaded into any destination. Read more	Write messages to the queue In this scenario, the data is extracted from any source, transformed, and loaded into a message queue using a configurable data exchange format. Read more
Stream messages from the queue In this scenario, the messages are streamed from the queue and loaded into the destination in real time. Read more	Stream CDC events from the queue In this scenario, the CDC events (INSERTS, UPDATES, and DELETES) previously ingested into the Kafka by Etlworks are streamed from the queue and loaded into the destination in real time. Read more
Stream CDC events to a message queue In this scenario, the CDC events (INSERTS, UPDATES, and DELETES) are extracted from the transaction log of the source database and sent to the message queue. Read more	Real-time change replication with Kafka, Debezium, and Etlworks In this scenario, the CDC events (INSERTS, UPDATES, and DELETES) previously ingested into the Kafka by Debezium are streamed from the queue and loaded into the destination in real time. Read more
Work with messages in JSON, XML, and other text Formats Select String for the Value serializer and Value deserializer. Read more	Read messages in the Avro Format You can read messages in Avro Format created by Etlworks or by third-party application. Read more
Write messages in Avro Format Select Avro for the Value serializer field. Read more	Stream unmodified messages from one queue to another Read more

Related resources

Kafka connector In the Connections window, click +, and type in kafka. Read more	Azure Event Hubs connector In the Connections window, type in azure event hub. Read more
Google PubSub connector In the Connections window, type in google pubsub. Read more	Amazon Kinesis connector In the Connections window, type in kineis. Read more
RabittMQ connector In the Connections window, type in rabbit. Read more	Amazon SQS connector In the Connections window, type in sqs. Read more
ActiveMQ connector In the Connections window, type in activemq. Read more	MQTT connector In the Connections window, type in mqtt. Read more
Byte Array Format This format can be used for sending and receiving messages without any transformations Read more	Avro Format Etlworks can read and write Avro files, including nested Avro files. Read more

Create a real-time data pipeline.

Message queues, such as Kafka, can be used to build (almost) real-time data pipelines to connect various applications.

For example, the relational database can publish the CDC log into the Kafka topic. The data integration Flow can subscribe to the topic, so once the new CDC message is published, it can be processed from the queue, parsed, transformed, and sent to the data warehouse as a particular transaction (INSERT / UPDATE / DELETE).

Available connectors

Etlworks supports the following connectors for message queues:

User cases

Read and process messages from the queue

In this scenario, the messages are Extracted from the queue in micro-batches, Transformed, and Loaded into any destination.

Etlworks also support streaming messages from the message queue into various destinations in real-time.

Advantages of traditional ETL vs. streaming

Ability to transform data using drag and drop transformations.
Some destinations, for example, HTTP endpoints, do not support streaming and require the ETL approach to load data from a message queue.

Here is a complete list of destinations that do not support streaming from a message queue:

HTTP endpoints.
Google Sheets.
Outbound email.
Redis.
NoSQL databases.

Advantages of the streaming vs. traditional ETL

Streaming is faster.
Streaming consumes fewer system resources.
Streaming is the only way to implement a real-time pipeline.

Here is a complete list of destinations that support streaming from a message queue:

Server storage, Amazon S3, Azure Blob, Google Cloud Storage, FTP, FTPS, SFTP, Box, Dropbox, Google Drive, OneDrive for Business, SharePoint, WebDAV, SMB Share.
Relational databases.
Message queues.
Snowflake, Redshift, Synapse Analytics, Google BigQuery, Vertica, Greenplum, Databricks, Clickhouse.

Process

Here are the steps to create an ETL flow to read and process messages from the queue.

Step 1. Create a source Connection for the message queue to read the messages. For example, Kafka Connection.

Step 2. Create a source Format. The following Formats are supported when reading messages from the queue:

Step 3. Create a destination Connection, for example, a Connection to the relational database.

Step 4. Optionally, create a destination Format.

Step 5. Create a Flow where the source is a message queue by typing in queue to in the Flow Selector popup:

Step 6. Continue by adding source-to-destination transformations where the source is a message queue Connection created in step 1, source Format created in step 2, and the destination Connection and (optionally) Format created in steps 3 and 4.

Step 7. When configuring source-to-destination transformation, enter the message queue topic name in the FROM field. Connectors for some message queues, for example, Kafka, support wildcard and comma-separated topic names.

Step 8. Schedule the Flow to stream data in real-time or to be executed periodically.

Tip: When scheduling the Flow to be executed periodically, it is recommended to use short intervals for micro-batching.

The alternative to scheduling is configuring the flow to run a daemon. The daemon ETL flow, where the source is a message queue, reads the fixed number of messages from the queue, sleeps for a few moments, and restarts from the last known position. Read more about daemon flows.

Write messages to the queue

In this scenario, the data is Extracted from any source, Transformed, and Loaded into a message queue using a configurable data exchange format.

Here are the steps to create an ETL flow to write messages to the queue.

Step 1. Create a source Connection.

Step 2. Optionally create a source Format.

Step 3. Create a destination Connection for the message queue to write the messages to. For example, Kafka Connection.

Step 4. Create a destination Format. The following Formats are supported when writing messages to the queue:

Step 5. Create a Flow where the destination is a message queue by typing in to queue in the Flow selector popup:

Step 6. Continue by adding source-to-destination transformations where the source Connection and Format are Connection and Format created in steps 1 and 2, and the destination Connection and Format are Connection and Format created in steps 3 and 4.

Enter the destination topic or queue name in the “TO” field. You can configure multiple source-to-destination transformations within the same flow, each with a different topic or queues as the destination.

Step 7. Schedule the Flow to be executed periodically.

Work with messages in CSV, JSON, XML, and other text Formats

To work with messages in these Formats, when configuring the Kafka Connection or Azure Event Hubs connection, select String for the Value serializer and Value deserializer.

There is no configuration for other Message Queue connectors.

Stream unmodified messages from one queue to another

In certain scenarios, an organization may need to transfer messages directly between various message queues, such as from one MQTT broker to another, without modifications. This approach is useful when:

Migrating from one broker or queue (e.g., RabbitMQ, AWS SQS, ActiveMQ, Kafka) to another while maintaining message continuity.
Distributing messages across different networks or regions with distinct brokers or queues.
Integrating applications relying on different messaging systems but requiring shared data.

This setup ensures message forwarding across supported queues without modification.

Step 1. Create a connection to the source queue.

Step 2. To stream data without stoping the flow clear these three parameters.

Step 3. If available set Value Serializer to Array of Bytes.

Step 4. Create a connection to the destination queue. If available set Value Serializer to Array of Bytes.

Step 5. Create Byte Array format.

Step 6. Create new Queue to Queue flow.

Step 7. Add a source-to-destination transformation where source and destination connections are connections created in steps 1-3 and 4 and source and destination format is a Byte Array format created in step 5.

When configuring source-to-destination transformation, enter the topic name in the FROM field. Etlworks supports wildcard topic names, such as *test. To load data into the same topics as in the source enter * in TO.

Read messages in the Avro Format

Read messages in Avro Format created by Etlworks

Kafka Connection and Azure Event Hubs only.

When configuring the connection, select Avro for the Value deserializer field.

Read messages in Avro Format created by third-party application

When configuring the connection, select Avro Record for the Value deserializer field.

Copy the Avro schema in the JSON Format into the Schema field.

Write messages in Avro Format