About Amazon Kinesis
Amazon Kinesis is an Amazon Web Service (AWS) for processing big data in real-time. Read more about Amazon Kinesis.
When to use this connector
- to read messages from and write messages to a given Kinesis stream.
- to implement a log-based CDC with a message queue.
Create a Connection
Step 1. In the
Connections window, click
+, and type in
Step 2. Select
Step 3. Enter the Connection parameters.
AWS Region: the AWS cloud region.
Stream name: the optional Kinesis stream name.
Access Key or IAM Role: the AWS access key or IAM role name.
Secret Access Key: the AWS secret access key. The secret access key must be empty if authenticating with the IAM role.
Starting Position Type: a starting position in the data stream from which to start streaming:
- TRIM_HORIZON: start streaming at the last untrimmed record in the shard, which is the oldest data record in the shard.
- LATEST: start streaming just after the most recent record in the shard, so that you always read the most recent data in the shard.
- RECORDED_SEQUENCE_NUMBER: start streaming right after the last recorded sequence number.
- AT_TIMESTAMP: start streaming from the position denoted by the time stamp specified in the Timestamp field.
- AT_SEQUENCE_NUMBER: start streaming from the position denoted by the sequence number specified in the SequenceNumber field.
- AFTER_SEQUENCE_NUMBER: start streaming right after the position denoted by the sequence number specified in the SequenceNumber field.
Starting Position: the value of this field will be used as a starting sequence number or the starting timestamp. It is used together with Starting Position Type if it is set to one of the following: AT_TIMESTAMP, AFTER_SEQUENCE_NUMBER, AT_SEQUENCE_NUMBER.
Max number of records to read: the total maximum number of records to read from the queue. Set it to a reasonable number to allow the system to process records in micro-batches. If nothing is entered the system will read records from the queue until there are no more records.
Max number of records to poll: the maximum number of records to poll in one call. The default is 1000.
Number of retries before stop polling: the number of retries before stop polling if poll returns no records. If no records are returned, that means no data records are currently available from the current shard. The connector will continue reading from the next available shard.
Retry N minutes before stop pollingDelay between retries: the number of minutes to retry before stop polling if the poll returns no records. If no records are returned, that means no data records are currently available from the current shard. The connector will continue reading from the next available shard.
Delay between retries (ms): the delay between retries. Set it to at least 1 second (1,000 milliseconds) between retries to avoid exceeding the limit on frequency.
How to generate partition key: the algorithm used to generate a partition key. A partition key is used to group data by shard within a stream. Kinesis Data Streams segregates the data records belonging to a stream into multiple shards. It uses the partition key that is associated with each data record to determine which shard a given data record belongs to. Partition keys are Unicode strings, with a maximum length limit of 256 characters for each key. An MD5 hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards using the hash key ranges of the shards.
Number of shards: the number of shards when creating a new stream. A single shard can ingest up to 1 MB of data per second (including partition keys) or 1,000 records per second for writes. Similarly, if you scale your stream to 5,000 shards, the stream can ingest up to 5 GB per second or 5 million records per second. If you need more ingest capacity, you can easily scale up the number of shards in the stream using the AWS Management Console or the UpdateShardCount API.
Amazon Kinesis as a message queue
By default, the Kinesis connector is set to process data from the oldest record in the shard (
Starting Position Type set to
If you wish to process records that haven't been processed yet set
Starting Position Type to