When to use this connector
- When working with files when the source or destination is S3
- When loading data in Snowflake or Redshift
Creating a connection
Step 1. In the
Connections window, click
+, and select
Step 2. Select
Step 3. Enter Connection parameters.
Endpoint: the web service host. It defaults to:
Bucket: the bucket name.
Directory: the directory under the bucket. This parameter is optional.
Files: the actual file name or a wildcard file name, for example,
Etlworks S3 connector supports authentication with access/secret key and IAM role.
Access Key or IAM Role: the AWS access key or IAM role name.
Secret Access Key: the AWS secret access key. The secret access key must be empty if authenticating with the IAM role.
If both authentication parameters are empty the connector will attempt to authenticate using the default profile configured for the EC2 instance running Etlworks.
Headers: optional HTTP headers.
All Amazon S3 headers have the prefix
x-amz-, even if you didn't set them.
Other parameters:additional configuration options for the S3 connection.
Requester Pays: a bucket in S3 is normally configured such that the bucket's owner pays all the service fees for accessing, sharing, and storing objects. The
Requester Paysfeature of S3 allows a bucket to be configured in such a way that the individual who sends the requests to a bucket is charged for the S3 request and data transfer fees instead of charging the bucket's owner.
Request Signature Version: Amazon S3 offers you the ability to identify what API signature version was used to sign a request. Signature Version 4 is supported in all AWS Regions, and it is the only version that is supported for new Regions.
Maximum Size For Multipart Upload (bytes): by entering a number greater than 5242880, you will enable multipart upload to S3. If nothing is entered (default), the multipart upload is disabled. The minimum part size is 5242880; meanwhile, the maximum is 5368709120. Multipart Uploads involves uploading an object's data in parts instead of all at once, which can give the following advantages:
- objects larger than 5 GB can be stored.
- large files can be uploaded in smaller pieces to reduce the impact of transient uploading/networking errors.
- objects can be constructed from data that is uploaded over a period of time, when it may not all be available in advance.
Add Suffix When Creating Files In Transformation: you can select one of the predefined suffixes for the files created using this Connection. For example, if you select
uuidas a suffix and the original file name is
dest.csv, Etlworks Integrator will create files with the name
dest_uuid.csv, where uuid is a globally unique identifier such as
Enable Wildcard File Name: if there are multiple source files in a folder, you can specify an algorithm that will be used to select an actual source file to process. For example, if Files is set to
oldestis selected, Etlworks Integrator will always select the oldest file in the folder which matches the wildcard
Override Wildcard File Name set for Connection: if
Wildcard File Namesare allowed, the option to
Override Wildcard File Name set for Connectionis enabled, and the file name entered in the
FROMfield of the transformation is a wildcard file name, the system will override the file name entered at the Connection level. The default behavior is to use the wildcard file name that was entered when the Connection was configured.
Archive file before copying to: Etlworks Integrator can archive files, using one of the supported algorithms (zip or gzip), before copying them to cloud storage. Since cloud storage is typically a paid service, it can save money and time if you choose to archive files.