- Starter
- Business
- Enterprise
- On-Premise
- Add-on
What can you do with messages queues in Etlworks
Read and process messages from the queue In this scenario, the messages are extracted from the queue in micro-batches, transformed, and loaded into any destination. |
Write messages to the queue In this scenario, the data is extracted from any source, transformed, and loaded into a message queue using a configurable data exchange format. |
Stream messages from the queue In this scenario, the messages are streamed from the queue and loaded into the destination in real time. |
Stream CDC events from the queue In this scenario, the CDC events ( |
Stream CDC events to a message queue In this scenario, the CDC events ( |
Real-time change replication with Kafka, Debezium, and Etlworks In this scenario, the CDC events ( |
Work with messages in JSON, XML, and other text Formats Select |
Read messages in the Avro Format You can read messages in Avro Format created by Etlworks or by third-party application. |
Write messages in Avro Format Select |
Stream unmodified messages from one queue to another |
Related resources
Kafka connector In the |
Azure Event Hubs connector In the |
Google PubSub connector In the |
Amazon Kinesis connector In the |
RabittMQ connector In the
|
Amazon SQS connector In the |
ActiveMQ connector In the
|
MQTT connector In the
|
Byte Array Format This format can be used for sending and receiving messages without any transformations |
Avro Format Etlworks can read and write Avro files, including nested Avro files.
|
Create a real-time data pipeline.
Message queues, such as Kafka, can be used to build (almost) real-time data pipelines to connect various applications.
For example, the relational database can publish the CDC log into the Kafka topic. The data integration Flow can subscribe to the topic, so once the new CDC message is published, it can be processed from the queue, parsed, transformed, and sent to the data warehouse as a particular transaction (INSERT
/ UPDATE
/ DELETE
).
Available connectors
Etlworks supports the following connectors for message queues:
User cases
- Reading and processing messages from the queue
- Writing messages to the queue
- Stream unmodified messages from one queue to another
- Log-based CDC with a message queue
- Stream CDC events from a message queue into any destination
- Real-time change replication with Kafka and Debezium
Read and process messages from the queue
In this scenario, the messages are Extracted from the queue in micro-batches, Transformed, and Loaded into any destination.
Etlworks also support streaming messages from the message queue into various destinations in real-time.
Advantages of traditional ETL vs. streaming
- Ability to transform data using drag and drop transformations.
- Some destinations, for example, HTTP endpoints, do not support streaming and require the ETL approach to load data from a message queue.
Here is a complete list of destinations that do not support streaming from a message queue:
- HTTP endpoints.
- Google Sheets.
- Outbound email.
- Redis.
- NoSQL databases.
Advantages of the streaming vs. traditional ETL
- Streaming is faster.
- Streaming consumes fewer system resources.
- Streaming is the only way to implement a real-time pipeline.
Here is a complete list of destinations that support streaming from a message queue:
- Server storage, Amazon S3, Azure Blob, Google Cloud Storage, FTP, FTPS, SFTP, Box, Dropbox, Google Drive, OneDrive for Business, SharePoint, WebDAV, SMB Share.
- Relational databases.
- Message queues.
- Snowflake, Redshift, Synapse Analytics, Google BigQuery, Vertica, Greenplum, Databricks, Clickhouse.
Process
Here are the steps to create an ETL flow to read and process messages from the queue.
Step 1. Create a source Connection for the message queue to read the messages. For example, Kafka Connection.
Step 2. Create a source Format. The following Formats are supported when reading messages from the queue:
Step 3. Create a destination Connection, for example, a Connection to the relational database.
Step 4. Optionally, create a destination Format.
Step 5. Create a Flow where the source is a message queue by typing in queue to
in the Flow Selector
popup:
Step 6. Continue by adding source-to-destination transformations where the source is a message queue Connection created in step 1, source Format created in step 2, and the destination Connection and (optionally) Format created in steps 3 and 4.
Step 7. When configuring source-to-destination transformation, enter the message queue topic name in the FROM
field. Connectors for some message queues, for example, Kafka, support wildcard and comma-separated topic names.
Step 8. Schedule the Flow to stream data in real-time or to be executed periodically.
When scheduling the Flow to be executed periodically, it is recommended to use short intervals for micro-batching.
The alternative to scheduling is configuring the flow to run a daemon. The daemon ETL flow, where the source is a message queue, reads the fixed number of messages from the queue, sleeps for a few moments, and restarts from the last known position. Read more about daemon flows.
Write messages to the queue
In this scenario, the data is Extracted from any source, Transformed, and Loaded into a message queue using a configurable data exchange format.
Here are the steps to create an ETL flow to write messages to the queue.
Step 1. Create a source Connection.
Step 2. Optionally create a source Format.
Step 3. Create a destination Connection for the message queue to write the messages to. For example, Kafka Connection.
Step 4. Create a destination Format. The following Formats are supported when writing messages to the queue:
Step 5. Create a Flow where the destination is a message queue by typing in to queue
in the Flow selector
popup:
Step 6. Continue by adding source-to-destination transformations where the source Connection and Format are Connection and Format created in steps 1 and 2, and the destination Connection and Format are Connection and Format created in steps 3 and 4.
Enter the destination topic or queue name in the “TO” field. You can configure multiple source-to-destination transformations within the same flow, each with a different topic or queues as the destination.
Step 7. Schedule the Flow to be executed periodically.
Work with messages in CSV, JSON, XML, and other text Formats
To work with messages in these Formats, when configuring the Kafka Connection or Azure Event Hubs connection, select String
for the Value serializer
and Value deserializer
.
There is no configuration for other Message Queue connectors.
Stream unmodified messages from one queue to another
In certain scenarios, an organization may need to transfer messages directly between various message queues, such as from one MQTT broker to another, without modifications. This approach is useful when:
- Migrating from one broker or queue (e.g., RabbitMQ, AWS SQS, ActiveMQ, Kafka) to another while maintaining message continuity.
- Distributing messages across different networks or regions with distinct brokers or queues.
- Integrating applications relying on different messaging systems but requiring shared data.
This setup ensures seamless message forwarding across supported queues without modification.
Step 1. Create a connection to the source queue.
Step 2. To stream data without stoping the flow clear these three parameters.
Step 3. If available set Value Serializer to Array of Bytes.
Step 4. Create a connection to the destination queue. If available set Value Serializer to Array of Bytes.
Step 5. Create Byte Array format.
Step 6. Create new Queue to Queue flow.
Step 7. Add a source-to-destination transformation where source and destination connections are connections created in steps 1-3 and 4 and source and destination format is a Byte Array format created in step 5.
When configuring source-to-destination transformation, enter the topic name in the FROM
field. Etlworks supports wildcard topic names, such as *test
. To load data into the same topics as in the source enter *
in TO.
Read messages in the Avro Format
Read messages in Avro Format created by Etlworks
Kafka Connection and Azure Event Hubs only.
When configuring the connection, select Avro
for the Value deserializer
field.
Read messages in Avro Format created by third-party application
When configuring the connection, select Avro Record
for the Value deserializer
field.
Copy the Avro schema in the JSON Format into the Schema
field.
Write messages in Avro Format
Kafka Connection and Azure Event Hubs only.
When configuring the connection, select Avro
for the Value serializer
field.
Comments
0 comments
Please sign in to leave a comment.