Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
Etlworks S3 connector supports reading, creating, updating, renaming, and deleting objects in S3.
When to use this connector
- When working with files when the source or destination is S3.
- When loading data in Snowflake or Redshift.
- When streaming CDC events to the cloud storage.
Creating a connection
Step 1. In the
Connections window, click
+, type in
Step 2. Select
Amazon S3 (SDK).
Step 3. Enter Connection parameters.
AWS Region: the AWS region. This parameter is only available for the SDK connector.
Bucket: the bucket name.
Directory: the directory under the bucket. This parameter is optional.
Files: the actual file name or a wildcard file name, for example,
Etlworks S3 connector supports authentication with access/secret key and IAM role.
Access Key or IAM Role: the AWS access key or IAM role name.
Secret Access Key: the AWS secret access key. Note: the secret access key must be empty if authenticating with the IAM role.
External ID: In abstract terms, the external ID allows the user that is assuming the role to assert the circumstances in which they are operating. It also provides a way for the account owner to permit the role to be assumed only under specific circumstances. The primary function of the external ID is to address and prevent the confused deputy problem.
If both authentication parameters are empty the connector will attempt to authenticate using the default profile configured for the EC2 instance running Etlworks.
Metadata: S3 object metadata. You can set object metadata in Amazon S3 at the time you upload the object. Object metadata is a set of name-value pairs. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata. Metadata is ignored when the multipart upload is configured.
Part Size For Multipart Upload (bytes): by entering a number greater than 5242880, you will enable multipart upload to S3. If nothing is entered (default), the multipart upload is disabled. The minimum part size is 5242880; meanwhile, the maximum is 5368709120. Multipart Uploads involve uploading an object's data in parts instead of all at once, which can give the following advantages:
- objects larger than 5 GB can be stored.
- large files can be uploaded in smaller pieces to reduce the impact of transient uploading/networking errors.
- objects can be constructed from data that is uploaded over a period of time, when it may not all be available in advance.
Add Suffix When Creating Files In Transformation: you can select one of the predefined suffixes for the files created using this Connection. For example, if you select
uuidas a suffix and the original file name is
dest.csv, Etlworks Integrator will create files with the name
dest_uuid.csv, where uuid is a globally unique identifier such as
Enable Wildcard File Name: if there are multiple source files in a folder, you can specify an algorithm that will be used to select an actual source file to process. For example, if Files is set to
oldestis selected, Etlworks Integrator will always select the oldest file in the folder which matches the wildcard
Override Wildcard File Name set for Connection: if
Wildcard File Namesare allowed, the option to
Override Wildcard File Name set for Connectionis enabled, and the file name entered in the
FROMfield of the transformation is a wildcard file name, the system will override the file name entered at the Connection level. The default behavior is to use the wildcard file name that was entered when the Connection was configured.
Archive file before copying to: Etlworks Integrator can archive files, using one of the supported algorithms (zip or gzip), before copying them to cloud storage. Since cloud storage is typically a paid service, it can save money and time if you choose to archive files.