- Starter
- Business
- Enterprise
- On-Premise
- Add-on
What can you do with messages queues in Etlworks Integrator
Read and process messages from the queue In this scenario, the messages are extracted from the queue in micro-batches, transformed, and loaded into any destination. |
Write messages to the queue In this scenario, the data is extracted from any source, transformed, and loaded into a message queue using a configurable data exchange format. |
Stream messages from the queue In this scenario, the messages are streamed from the queue and loaded into the destination in real time. |
Stream CDC events from the queue In this scenario, the CDC events ( |
Stream CDC events to a message queue In this scenario, the CDC events ( |
Real-time change replication with Kafka, Debezium, and Etlworks In this scenario, the CDC events ( |
Work with messages in JSON, XML, and other text Formats Select |
Read messages in the Avro Format You can read messages in Avro Format created by the Etlworks Integrator or by third-party application. |
Write messages in Avro Format Select |
|
Related resources
Avro Format Etlworks Integrator can read and write Avro files, including nested Avro files. |
Kafka connector In the |
Flows optimized for Kafka The Etlworks Integrator includes several flow types optimized for Kafka.
|
Azure Event Hubs connector In the |
Amazon Kinesis connector In the
|
Amazon SQS connector In the |
RabittMQ connector In the
|
ActiveMQ connector In the
|
Create a real-time data pipeline.
Message queues, such as Kafka, can be used to build (almost) real-time data pipelines to connect various applications.
For example, the relational database can publish the CDC log into the Kafka topic. The data integration Flow can subscribe to the topic, so once the new CDC message is published, it can be processed from the queue, parsed, transformed, and sent to the data warehouse as a particular transaction (INSERT
/ UPDATE
/ DELETE
).
Available connectors
Etlworks Integrator supports the following connectors for message queues:
User cases
- Reading and processing messages from the queue
- Writing messages to the queue
- Log-based CDC with a message queue
- Stream CDC events from a message queue into any destination
- Real-time change replication with Kafka and Debezium
Read and process messages from the queue
In this scenario, the messages are Extracted from the queue in micro-batches, Transformed, and Loaded into any destination.
Etlworks also support streaming messages from the message queue into various destinations in real-time.
Advantages of traditional ETL vs. streaming
- Ability to transform data using drag and drop transformations.
- Currently, only Kafka and Azure Event Hubs connectors support streaming data from a message queue.
- Some destinations, for example, HTTP endpoints, do not support streaming and require the ETL approach to load data from a message queue.
Here is a complete list of destinations that do not support streaming from a message queue:
- HTTP endpoints.
- Google Sheets.
- Outbound email.
- Redis.
- NoSQL databases.
- Other messages queues.
Advantages of the streaming vs. traditional ETL
- Streaming is faster.
- Streaming consumes fewer system resources.
- Streaming is the only way to implement a real-time pipeline.
Here is a complete list of destinations that support streaming from a message queue:
- Server storage, Amazon S3, Azure Blob, Google Cloud Storage, FTP, FTPS, SFTP, Box, Dropbox, Google Drive, OneDrive for Business, SharePoint, WebDAV, SMB Share.
- Relational databases.
- Snowflake, Redshift, Synapse Analytics, Google BigQuery, Vertica, Greenplum, Databricks, Clickhouse.
Process
Here are the steps to create an ETL flow to read and process messages from the queue.
Step 1. Create a source Connection for the message queue to read the messages. For example, Kafka Connection.
Step 2. Create a source Format. The following Formats are supported when reading messages from the queue:
Step 3. Create a destination Connection, for example, a Connection to the relational database.
Step 4. Optionally, create a destination Format.
Step 5. Create a Flow where the source is a message queue by typing in queue to
in the Flow Selector
popup:
Step 6. Continue by adding source-to-destination transformations where the source is a message queue Connection created in step 1, source Format created in step 2, and the destination Connection and (optionally) Format created in steps 3 and 4.
Step 7. When configuring source-to-destination transformation, enter the message queue topic name in the FROM
field. Connectors for some message queues, for example, Kafka, support wildcard and comma-separated topic names.
Step 8. Schedule the Flow to stream data in real-time or to be executed periodically.
When scheduling the Flow to be executed periodically, it is recommended to use short intervals for micro-batching.
The alternative to scheduling is configuring the flow to run a daemon. The daemon ETL flow, where the source is a message queue, reads the fixed number of messages from the queue, sleeps for a few moments, and restarts from the last known position. Read more about daemon flows.
Write messages to the queue
In this scenario, the data is Extracted from any source, Transformed, and Loaded into a message queue using a configurable data exchange format.
Here are the steps to create an ETL flow to write messages to the queue.
Step 1. Create a source Connection.
Step 2. Optionally create a source Format.
Step 3. Create a destination Connection for the message queue to write the messages to. For example, Kafka Connection.
Step 4. Create a destination Format. The following Formats are supported when writing messages to the queue:
Step 5. Create a Flow where the destination is a message queue by typing in to queue
in the Flow selector
popup:
Step 6. Continue by adding source-to-destination transformations where the source Connection and Format are Connection and Format created in steps 1 and 2, and the destination Connection and Format are Connection and Format created in steps 3 and 4.
Step 7. Schedule the Flow to be executed periodically.
Work with messages in JSON, XML, and other text Formats
To work with messages in these Formats, when configuring the Kafka Connection, select String
for the Value serializer
and Value deserializer
.
Read messages in the Avro Format
Read messages in Avro Format created by the Etlworks Integrator
When configuring the Kafka Connection, select Avro
for the Value deserializer
field.
Read messages in Avro Format created by third-party application
When configuring the Kafka connection, select Avro Record
for the Value deserializer
field.
Copy the Avro schema in the JSON Format into the Schema
field.
Write messages in Avro Format
When configuring the Kafka Connection, select Avro
for the Value serializer
field.
Comments
0 comments
Please sign in to leave a comment.