Kafka connector – Etlworks Support

Startup
Business
Enterprise
On-Premise
Add-on

About Kafka

Apache Kafka is a distributed streaming platform. Read more about Kafka.

When to use this connector

to read messages from and write messages to a given Kafka topic(s).
to stream messages from the queue to various destinations.
to implement a log-based CDC with a message queue.
- Stream CDC events to the message queue.
- Stream CDC events from the message queue to any destination.
to implement a real-time change replication with Kafka and Demezium.

Create a Connection

Step 1. In the Connections window, click +, and type in kafka.

Step 2. Select Kafka.

Step 3. Enter the Connection parameters.

Connection parameters

Bootstrap server(s): a Connection string in a form: host1:port1,host2:port2,. Since these servers are just used for the initial Connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).
Topic(s): a topic to read messages from or write messages to. For reading, the wildcard topic names, for example, inbound.*, or comma-separated topic names, for example, topic1,topic2 topics are supported.
Host and Port: alternatively, the list of the bootstrap server(s) you can enter a Kafka server host and a port.
Simple Authentication and Security Layer Mechanisms: choose between scram (default) and plain.
User: a user name used for PLAIN authentication. Read about Kafka security.
Password: a password used for PLAIN authentication.
Properties: the additional properties for the Kafka consumer, Kafka producer, and Kafka security. The properties must be in a Format key1=value1;key1=value1.
Auto Commit: if enabled, the Kafka consumer will periodically commit the offset when reading the messages from the queue. It is recommended to keep it disabled so the system can commit the offset right after the messages have been processed.
Starting Offset: a starting offset at which to begin the fetch.
Key Deserializer: the deserializer for the key.
Value Deserializer: the deserializer for the value. When the value is a document in Avro Format, use either Avro (when processing messages enqueued by Etlworks) or Avro Record (when processing messages enqueued by the third-party application). The latter requires an Avro Schema.
Max number of records to read: the total maximum number of records to read in one micro-batch.
Poll duration: how long (in milliseconds) the consumer should wait while fetching the data from the queue.
Max number of records to poll: the maximum number of records that can be fetched from the queue in a single poll call.
Number of retries before stop polling: the number of retries before stop polling if the poll returns no records. The default is 5.
Consumer Preprocessor: a program in JavaScript that can be used to modify the source message or the destination name. Read more.
Integration with CDC providers: the CDC provider. Select either Etlworks or Debezium if you are planning to use this connection for processing CDC events created by ETL CDC connectors or Debezium.
Key Serializer: the serializer for the key.
Value Serializer: the serializer for the value. Use Avro when writing messages in Avro Format.
Compression: the compression algorithm used when writing messages.
Record headers: record headers are key-value pairs that allow you to add some metadata about the Kafka record without adding any extra information to the key/value pair of the record itself.
Producer Preprocessor: a program in JavaScript that can be used to modify the message which will be added to the topic. Read more.

Authentication

The Etlworks Kafka connector supports authentication using SASL (Simple Authentication and Security Layer) with the following mechanisms:

PLAIN: Simple username/password authentication.

Username and password are optional.

SCRAM (Salted Challenge Response Authentication Mechanism): Secure username/password authentication.

Username and password are required.

AWS IAM: Authentication using AWS Identity and Access Management (IAM) for Amazon MSK (Managed Streaming for Kafka).

No username or password needed.
Uses AWS credentials from:

• Attached IAM role (if running on AWS).

• Environment variables.

• AWS credentials file.
Only works with Amazon Managed Streaming for Apache Kafka (MSK).

Consumer preprocessor

Use Consumer Preprocessor to modify the source message and/or destination name.

Available scripting languages

JavaScript

Available variables

event: the message serialized as JsonNode.
fields: the source fields serialized as JsonNode.
topic: the Kafka topic.
destination: the destination name.
db - the source database name.
schema - the source schema name.
other - the reference to the array of record headers. Use the following code to access the headers:

var headers = other.toArray();

for (var i = 0; id < headers.length; i++) {
    var header = headers[i];

    var key = header.key();

    var value = new java.lang.String(header.value());
}

Get the values of the column

var val = event.get('columns_name').asText();

Modify the value of the column

Note: Using this technique you can modify the values of the existing columns and add new columns to the stream.

event.put('columns_name', 'string');

Possible return values

java.lang.Boolean.TRUE/ FALSE: if FALSE, the message will be skipped.
TypedKeyValue<String, Boolean>: the key/value pair,w where the key contains the destination name and the value is java.lang.Boolean.TRUE/FALSE.
Anything else is ignored.

Set the destination name

By default, the destination table name or a filename is calculated using a template set in TO where any part of the TO can contain any of the following [tokens]:

[table] - the source table name.
[db] - the source database name.
[schema] - the source schema.
* - the source topic name.

The flow substitutes [tokens] on the values of the [tokens].

This example demonstrates how to configure a transformation to load data into the destination table in the public schema and with the same name as a source table: public.[table].

The developer can also set the destination name using JavaScript. Simply return (assign it to variable valuein the last line of the Consumer Preprocessor script) the instance of TypedKeyValue class, where the key = the new destination name.

Here is an example:

var topic.split('\\.', -1);
// assuming that the topic name includes database.schema.table and we only need table name
value = new TypedKeyValue('public.' + topic[2], java.lang.Boolean.TRUE);

Producer Preprocessor

Use Producer Preprocessor to modify the message which will be added to the topic.

Scripting languages

JavaScript

Variables

message: the original message, serialized as String or ByteArrayStream.
producerPackage: the instance of BaseProducerPackage.
topic: the topic name.

Return values

value = modified_message - the modified message
value = null - the messages will not be added to the topic.

Articles in this section