Amazon Kinesis connector – Etlworks Support

Starter
Business
Enterprise
On-Premise
Add-on

About Amazon Kinesis

Amazon Kinesis is an Amazon Web Service (AWS) for processing big data in real-time. Read more about Amazon Kinesis.

When to use this connector

to read messages from and write messages to a given Kinesis stream.
to stream messages from the queue to various destinations.
to implement a log-based CDC with a message queue.
- Stream CDC events to the message queue.
- Stream CDC events from the message queue to any destination.

Create a Connection

Step 1. In the Connections window, click +, and type in kinesis.

Step 2. Select Kinesis.

Step 3. Enter the Connection parameters.

Connection parameters

Common parameters

AWS Region: the AWS cloud region.
Stream name: the optional Kinesis stream name.
Starting Position Type: a starting position in the data stream from which to start streaming:
Starting Position: the value of this field will be used as a starting sequence number or the starting timestamp. It is used together with Starting Position Type if it is set to one of the following: AT_TIMESTAMP, AFTER_SEQUENCE_NUMBER, AT_SEQUENCE_NUMBER.
Max number of records to read: the total maximum number of records to read from the queue. Set it to a reasonable number to allow the system to process records in micro-batches. If nothing is entered the system will read records from the queue until there are no more records.
Max number of records to poll: the maximum number of records to poll in one call. The default is 1000.
Number of retries before stop polling: the number of retries before stop polling if poll returns no records. If no records are returned, that means no data records are currently available from the current shard. The connector will continue reading from the next available shard.
Retry N minutes before stop pollingDelay between retries: the number of minutes to retry before stop polling if the poll returns no records. If no records are returned, that means no data records are currently available from the current shard. The connector will continue reading from the next available shard.
Delay between retries (ms): the delay between retries. Set it to at least 1 second (1,000 milliseconds) between retries to avoid exceeding the limit on frequency.
How to generate partition key: the algorithm used to generate a partition key. A partition key is used to group data by shard within a stream. Kinesis Data Streams segregates the data records belonging to a stream into multiple shards. It uses the partition key that is associated with each data record to determine which shard a given data record belongs to. Partition keys are Unicode strings, with a maximum length limit of 256 characters for each key. An MD5 hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards using the hash key ranges of the shards.
Number of shards: the number of shards when creating a new stream. A single shard can ingest up to 1 MB of data per second (including partition keys) or 1,000 records per second for writes. Similarly, if you scale your stream to 5,000 shards, the stream can ingest up to 5 GB per second or 5 million records per second. If you need more ingest capacity, you can easily scale up the number of shards in the stream using the AWS Management Console or the UpdateShardCount API.

Authentication

Etlworks Kinesis connector supports authentication with access/secret key and IAM role.

Access Key or IAM Role: the AWS access key or IAM role name.
Secret Access Key: the AWS secret access key. Note: the secret access key must be empty if authenticating with the IAM role.
External ID: In abstract terms, the external ID allows the user that is assuming the role to assert the circumstances in which they are operating. It also provides a way for the account owner to permit the role to be assumed only under specific circumstances. The primary function of the external ID is to address and prevent the confused deputy problem.

If both authentication parameters are empty the connector will attempt to authenticate using the default profile configured for the EC2 instance running Etlworks.

NOTE: If you are randomly getting error "Failed to load credentials from IMDS" consider configuring the auto-rerty with at least 5 retry attempts.

Auto-retry

To configure auto-retry for each individual request to AWS API set the following parameters:

Number of Retries: the maximum number of times that a single request should be retried, assuming it fails for a retryable error.
Initial wait time (ms): the initial wait time in milliseconds before making the first retry attempt. This delay increases exponentially with each subsequent retry, often combined with jitter to avoid collisions from simultaneous retries. The default is 500 milliseconds.
Maximum delay (seconds): without a maximum limit, the wait time can become excessively long, especially after multiple retries. This can lead to significant delays in processing. The default is 10 seconds.

Amazon Kinesis as a message queue

By default, the Kinesis connector is set to process data from the oldest record in the shard (Starting Position Type set to TRIM_HORIZON).

If you wish to process records that haven't been processed yet set Starting Position Type to RECORDED_SEQUENCE_NUMBER.

Articles in this section