Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
Etlworks S3 connector supports reading, creating, updating, renaming, and deleting objects in S3.
When to use this connector
- When working with files when the source or destination is S3.
- When loading data in Snowflake or Redshift.
- When streaming CDC events to the cloud storage.
Creating a connection
Step 1. In the
Connections window, click
+, type in
Step 2. Select
Amazon S3 (SDK) (preferred) or
Amazon S3 (legacy).
Amazon S3 (SDK) connector created using the latest AWS SDK. It is faster and supports more authentication options.
Amazon S3 (legacy) is built on top of the jets3t library. We will keep the legacy S3 connector forever for backward compatibility reasons.
Step 3. Enter Connection parameters.
AWS Region: the AWS region. This parameter is only available for the SDK connector.
Endpoint: the web service host. It defaults to:
s3.amazonaws.com. This parameter is only available for the Legacy connector.
Bucket: the bucket name.
Directory: the directory under the bucket. This parameter is optional.
Files: the actual file name or a wildcard file name, for example,
Etlworks S3 connector supports authentication with access/secret key and IAM role.
Access Key or IAM Role: the AWS access key or IAM role name.
Secret Access Key: the AWS secret access key. Note: the secret access key must be empty if authenticating with the IAM role.
If both authentication parameters are empty the connector will attempt to authenticate using the default profile configured for the EC2 instance running Etlworks.
Parameters for Legacy connector
Other parameters:additional configuration options for the Legacy S3 connection.
Requester Pays: a bucket in S3 is normally configured such that the bucket's owner pays all the service fees for accessing, sharing, and storing objects. The
Requester Paysfeature of S3 allows a bucket to be configured in such a way that the individual who sends the requests to a bucket is charged for the S3 request and data transfer fees instead of charging the bucket's owner. This parameter is only available for the Legacy connector.
Request Signature Version: Amazon S3 offers you the ability to identify what API signature version was used to sign a request. Signature Version 4 is supported in all AWS Regions, and it is the only version that is supported for new Regions.
Metadata: S3 object metadata. You can set object metadata in Amazon S3 at the time you upload the object. Object metadata is a set of name-value pairs. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata. Metadata is ignored when the multipart upload is configured.
Part Size For Multipart Upload (bytes): by entering a number greater than 5242880, you will enable multipart upload to S3. If nothing is entered (default), the multipart upload is disabled. The minimum part size is 5242880; meanwhile, the maximum is 5368709120. Multipart Uploads involve uploading an object's data in parts instead of all at once, which can give the following advantages:
- objects larger than 5 GB can be stored.
- large files can be uploaded in smaller pieces to reduce the impact of transient uploading/networking errors.
- objects can be constructed from data that is uploaded over a period of time, when it may not all be available in advance.
Add Suffix When Creating Files In Transformation: you can select one of the predefined suffixes for the files created using this Connection. For example, if you select
uuidas a suffix and the original file name is
dest.csv, Etlworks Integrator will create files with the name
dest_uuid.csv, where uuid is a globally unique identifier such as
Enable Wildcard File Name: if there are multiple source files in a folder, you can specify an algorithm that will be used to select an actual source file to process. For example, if Files is set to
oldestis selected, Etlworks Integrator will always select the oldest file in the folder which matches the wildcard
Override Wildcard File Name set for Connection: if
Wildcard File Namesare allowed, the option to
Override Wildcard File Name set for Connectionis enabled, and the file name entered in the
FROMfield of the transformation is a wildcard file name, the system will override the file name entered at the Connection level. The default behavior is to use the wildcard file name that was entered when the Connection was configured.
Archive file before copying to: Etlworks Integrator can archive files, using one of the supported algorithms (zip or gzip), before copying them to cloud storage. Since cloud storage is typically a paid service, it can save money and time if you choose to archive files.