Amazon S3 connector – Etlworks Support

About S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability, data availability, security, and performance.

Etlworks S3 connector supports reading, creating, updating, renaming, and deleting objects in S3.

When to use this connector

When working with files when the source or destination is S3.
When loading data in Snowflake or Redshift.
When streaming CDC events to the cloud storage.

Creating a connection

Step 1. In the Connections window, click +, type in s3.

Step 2. Select Amazon S3 (SDK).

Step 3. Enter Connection parameters.

Connection parameters

Common parameters

AWS Region: the AWS region. This parameter is only available for the SDK connector.
Bucket or MRPA: the name of the S3 bucket or Multi-Region Access Point (for MRAPs, do not include the ARN prefix).
AWS Account ID: Required only when using a Multi-Region Access Point. Leave empty for all other cases. You can find the 12-digit AWS Account ID in the AWS Console under My Account.
Directory: the directory under the bucket. This parameter is optional.
Files: the actual file name or a wildcard file name, for example, *.csv.

Authentication

The S3 connector supports three authentication options. Pick based on where Etlworks runs and how your AWS account is set up.

Option	When to use	Access Key / Role ARN field	Secret Access Key field
Access Key / Secret	The simplest option. Works from any Etlworks deployment (cloud, on-premise, your machine).	AWS access key ID	AWS secret access key
IAM Role (AssumeRole)	Etlworks runs on AWS and you want the S3 access to flow through an IAM role — recommended for production, especially for cross-account access.	Role ARN	Empty
Instance profile (no credentials)	Etlworks runs on an AWS host (EC2, ECS, EKS, Fargate) with an attached instance profile that already has S3 permissions.	Empty	Empty

Option 1: Access Key / Secret

Standard IAM-user credentials. Etlworks signs every S3 request with the access key and secret you provide. Works regardless of where Etlworks is deployed.

Set on the connection

Access Key or IAM Role — the AWS access key ID.
Secret Access Key — the matching AWS secret access key.

Set up on the AWS side

In the AWS console, open IAM → Users.
Create an IAM user (or pick an existing one) that will represent Etlworks.

Attach an inline or managed policy granting the S3 permissions Etlworks needs. For typical read/write access to a single bucket, the policy below is a reasonable starting point — replace your-bucket-name with your bucket:

{{
  "Version": "2012-10-17",
  "Statement": [
    {{
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name"
    }},
    {{
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }}
  ]
}}

Open the user's Security credentials tab and create an access key pair. Copy the Access key ID and the Secret access key.
Paste them into the Etlworks connection's Access Key or IAM Role and Secret Access Key fields.

Note: AWS rotates access keys on a regular cadence as a best practice. When you rotate the key, update the Etlworks connection — or use one of the other two options below to avoid the rotation overhead entirely.

Option 2: IAM Role (AssumeRole)

Etlworks calls STS AssumeRole to obtain temporary credentials scoped to the role you specify. Credentials are refreshed automatically — no static keys to rotate. This is the recommended option for production deployments on AWS and is required for cross-account S3 access.

Requires Etlworks to be running on an AWS host (EC2, ECS, EKS, Fargate) with an instance profile that has permission to call sts:AssumeRole on the target role.

Set on the connection

Access Key or IAM Role — the full role ARN, in the form arn:aws:iam::<account-id>:role/<role-name>.
Secret Access Key — leave empty. This is what tells the connector to use the AssumeRole path.
External ID — optional. Set when the target role's trust policy requires an external ID. Used in cross-account scenarios to defend against the confused deputy problem.

Set up on the AWS side

Two roles are involved: the instance role attached to the Etlworks host (the principal that calls AssumeRole) and the target role that has the actual S3 permissions. They can be in the same AWS account or in different accounts.

Step 1. Create the S3 access policy.

In the AWS console, open IAM → Policies and click Create policy.

Use the JSON editor. Paste a policy that grants the S3 permissions Etlworks needs — for example, read/write access to one bucket (replace your-bucket-name):

{{
  "Version": "2012-10-17",
  "Statement": [
    {{
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name"
    }},
    {{
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }}
  ]
}}

Name the policy (e.g., etlworks-s3-access) and create it.

Step 2. Create the target role.

In the AWS console, open IAM → Roles and click Create role.
Choose Custom trust policy as the trusted entity type.

Paste a trust policy that allows the Etlworks instance role to assume this role. Replace <etlworks-account-id> and <etlworks-instance-role-name> with the account ID and the role name of the Etlworks host's instance profile:

{{
  "Version": "2012-10-17",
  "Statement": [
    {{
      "Effect": "Allow",
      "Principal": {{
        "AWS": "arn:aws:iam::<etlworks-account-id>:role/<etlworks-instance-role-name>"
      }},
      "Action": "sts:AssumeRole"
    }}
  ]
}}

For cross-account access with an external ID, add a condition to the trust policy:

{{
  "Version": "2012-10-17",
  "Statement": [
    {{
      "Effect": "Allow",
      "Principal": {{
        "AWS": "arn:aws:iam::<etlworks-account-id>:role/<etlworks-instance-role-name>"
      }},
      "Action": "sts:AssumeRole",
      "Condition": {{
        "StringEquals": {{
          "sts:ExternalId": "<your-external-id>"
        }}
      }}
    }}
  ]
}}

Attach the etlworks-s3-access policy from Step 1.
Name the role (e.g., etlworks-s3-role) and create it.
Copy the role's ARN. You'll paste this into the Etlworks connection.

Step 3. Grant the Etlworks instance role permission to assume the target role.

In the AWS console, open IAM → Roles and select the Etlworks instance role.

Attach an inline policy that allows sts:AssumeRole on the target role:

{{
  "Version": "2012-10-17",
  "Statement": [
    {{
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::<target-account-id>:role/etlworks-s3-role"
    }}
  ]
}}

Step 4. Configure the Etlworks connection.

Paste the target role ARN into Access Key or IAM Role.
Leave Secret Access Key empty.
If the trust policy uses an external ID, set External ID to the same value.
Test the connection.

Cross-account setup: Etlworks-hosted instance, your AWS account

The most common AssumeRole scenario in practice: Etlworks hosts your instance (the shared cloud at app.etlworks.com or a dedicated Etlworks-managed instance), and your S3 buckets live in your own AWS account. The trust policy on your target role needs to reference Etlworks's instance role as the principal; Etlworks just needs to be told what your role ARN is.

Values Etlworks will share with you (request these from Etlworks support before starting):

The Etlworks AWS account ID (12-digit).
The Etlworks instance role name — the IAM role attached to the Etlworks host that will call sts:AssumeRole.

Combined into a single ARN string for your trust policy:

arn:aws:iam::<ETLWORKS-AWS-ACCOUNT-ID>:role/<ETLWORKS-INSTANCE-ROLE-NAME>

Values you use to configure your S3 connection:

The role ARN of the target role you create in your account — arn:aws:iam::<your-account-id>:role/<your-role-name>.
The external ID you chose (any stable string — the customer picks it; required when the trust policy enforces it).

Steps on your AWS side

Create the S3 access policy. Replace your-bucket-name:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket", "s3:GetBucketLocation"],
      "Resource": "arn:aws:s3:::your-bucket-name"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }
  ]
}

Create the target IAM role with the trust policy below. Substitute the two Etlworks values you got from support, and pick any stable string for the external ID:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::<ETLWORKS-AWS-ACCOUNT-ID>:role/<ETLWORKS-INSTANCE-ROLE-NAME>"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": { "sts:ExternalId": "<YOUR-EXTERNAL-ID>" }
    }
  }]
}

Attach the S3 access policy from Step 1 to the role.
Send Etlworks back the role ARN and the external ID you chose.

Etlworks connection configuration (Etlworks support or anyone with admin access to the connection):

Paste the customer's role ARN into Access Key or IAM Role.
Leave Secret Access Key empty.
Paste the external ID into External ID.
Test the connection.

Etlworks does not need access to your AWS account, your S3 keys, or your AWS console. The only inbound capability is your target role's trust policy explicitly allowing Etlworks's instance role to call sts:AssumeRole — and only when scoped to the external ID you set.

Option 3: Instance profile (no credentials)

The simplest production option when Etlworks is deployed on AWS. The connector uses the credentials AWS already supplies to the host through its attached instance profile — no keys in Etlworks, no AssumeRole indirection. The instance profile must have the S3 permissions Etlworks needs.

Set on the connection

Access Key or IAM Role — leave empty.
Secret Access Key — leave empty.

With both fields empty, the connector falls back to the AWS instance metadata service (IMDS) and uses the role attached to the host.

Set up on the AWS side

Create the S3 access policy as described in Step 1 of Option 2.
Attach the policy directly to the instance role used by the Etlworks host (the EC2 instance profile, the ECS task role, the EKS service account's IAM role, or the Fargate task role — whichever applies to your deployment).
No external ID, no AssumeRole, no role ARN to paste — the host's instance profile is the credential.

Notes:

Doesn't work for Etlworks deployments outside AWS — on-premise or other clouds — because there's no AWS instance metadata service to query.
If you're getting intermittent "Failed to load credentials from IMDS" errors, configure auto-retry with at least 5 retry attempts. IMDS occasionally throttles or times out briefly and a retry resolves it.

Quick reference

	Access Key field	Secret field	External ID	Requires AWS host
Access Key / Secret	Access key ID	Secret access key	—	No
IAM Role (AssumeRole)	Role ARN	Empty	Optional (required for cross-account if the trust policy demands it)	Yes
Instance profile	Empty	Empty	—	Yes

Other parameters

Metadata: S3 object metadata. You can set object metadata in Amazon S3 at the time you upload the object. Object metadata is a set of name-value pairs. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata. Metadata is ignored when the multipart upload is configured.
Download chunk size (bytes): This setting allows you to specify the size of each data chunk, in bytes, that will be downloaded from the server during a file transfer. Using chunked downloads can improve reliability and performance, especially when working with large files. If a download is interrupted, only the current chunk needs to be retried, rather than starting from the beginning. Adjust this value based on your network speed and the size of the files being downloaded.
Part Size For Multipart Upload (bytes): by entering a number greater than 5242880, you will enable multipart upload to S3. If nothing is entered (default), the multipart upload is disabled. The minimum part size is 5242880; meanwhile, the maximum is 5368709120. Multipart Uploads involve uploading an object's data in parts instead of all at once, which can give the following advantages:
- objects larger than 5 GB can be stored.
- large files can be uploaded in smaller pieces to reduce the impact of transient uploading/networking errors.
- objects can be constructed from data that is uploaded over a period of time, when it may not all be available in advance.
Add Suffix When Creating Files In Transformation: you can select one of the predefined suffixes for the files created using this Connection. For example, if you select uuid as a suffix and the original file name is dest.csv, Etlworks will create files with the name dest_uuid.csv, where uuid is a globally unique identifier such as 21EC2020-3AEA-4069-A2DD-08002B30309D.

Note: This parameter works only when the file is created using source-to-destination-transformation. Read how to add a suffix to the files created when copying, moving, renaming, and zipping files.

File Processing Order: Specifies the order in which source files are processed when using wildcard patterns in ETL and file-based flows (e.g., copy, move, delete). The default setting is Oldest, meaning files are processed starting with the oldest by creation or modification time. Choose from various criteria such as file age, size, or name to determine the processing sequence:
- Disabled: wildcard processing is disabled,
- Oldest/Newest: Process files based on their creation or modification time, Ascending/Descending: Process files in alphabetical order, Largest/Smallest: Process files based on their size.
Archive file before copying to: Etlworks can archive files, using one of the supported algorithms (zip or gzip), before copying them to cloud storage. Since cloud storage is typically a paid service, it can save money and time if you choose to archive files.
Contains CDC events: When this parameter is enabled, the Etlworks adds standard wildcard templates for CDC files to the list of available sources in the FROM selector.

Working with Multi-Region S3 Buckets (MRAPs)

Etlworks fully supports Amazon S3 Multi-Region Access Points (MRAPs), allowing you to route S3 operations through a single global endpoint with automatic region-aware access.

To use a Multi-Region Access Point in the S3 connector:

Create the MRAP in AWS:
- In the S3 Console, go to Multi-Region Access Points
- Click Create Multi-Region Access Point
- Choose two or more existing buckets in different regions
- Give your MRAP a name (e.g., global-access)
- AWS will assign it a unique alias (e.g., myxyzx7bn8rdn.mrap)
Configure the connector in Etlworks:
- In the Bucket or MRPA field, enter the MRAP alias (e.g., myxyzx7bn8rdn.mrap)
- In the AWS Account ID field, enter the 12-digit AWS Account ID (e.g., 123456789559)
- Etlworks automatically reconstructs the full ARN:
  
  arn:aws:s3::[account-id]:accesspoint/[alias]
Region setting:
- You can keep the AWS Region set to a standard region (e.g., us-east-1)
- The SDK will automatically route traffic using the MRAP via Global Accelerator

This configuration allows you to build region-agnostic, failover-resilient S3 workflows across multiple AWS regions with no manual routing logic.

Auto-retry

To configure auto-retry for each individual request to AWS API set the following parameters:

Number of Retries: the maximum number of times that a single request should be retried, assuming it fails for a retryable error.
Initial wait time (ms): the initial wait time in milliseconds before making the first retry attempt. This delay increases exponentially with each subsequent retry, often combined with jitter to avoid collisions from simultaneous retries. The default is 500 milliseconds.
Maximum delay (seconds): without a maximum limit, the wait time can become excessively long, especially after multiple retries. This can lead to significant delays in processing. The default is 10 seconds.

Chunked download and upload

When working with large files in Amazon S3, downloading or uploading the entire file in one go can be inefficient and prone to errors, especially for applications with limited memory or unstable network connections. A common approach to handling this is by using chunked downloads and uploads, which breaks the file into smaller parts. Here are the key advantages of using chunked transfers for both download and upload in an S3 connection:

Improved Memory Efficiency
Increased Reliability and Resilience
Parallel Processing for Faster Transfers
Resume Interrupted Transfers
Lower Latency for Large Files
Scalability for Large File Transfers

It is essential to optimize both the upload and download processes to ensure efficiency, reliability, and performance. Two critical settings that impact this are the Part Size for Multipart Upload and the Download Chunk Size. These parameters allow fine-tuning of how files are split into manageable parts for transfer, improving memory usage, reducing the chance of network-related errors, and allowing for parallel operations.

Below is a detailed explanation of each setting:

1. Part Size for Multipart Upload (bytes)

This parameter controls the size of each part during a multipart upload to Amazon S3. In a multipart upload, a large file is divided into smaller parts, each of which is uploaded independently. The minimum size for each part is 5 MB (5,242,880 bytes), as per AWS S3 requirements.

Purpose: Setting this value determines the size of each part when uploading large files using multipart upload.

Recommendations: For optimal performance, choose a part size that balances between the number of parts and the upload speed. A smaller part size results in more parts, which can impact the efficiency of the upload, especially with very large files.

The default minimum part size for S3 multipart uploads is 5 MB, but depending on your network conditions or file sizes, you can increase this to reduce the number of parts, which may enhance performance.

Example: If you’re uploading a 1 GB file with a part size of 10 MB (10,485,760 bytes), the file will be split into 100 parts.

2. Download Chunk Size (bytes)

This parameter specifies the size of each chunk to be used during a chunked download from Amazon S3. When downloading a large file, it can be broken into smaller pieces (chunks), each of which is downloaded separately.

Purpose: Defines the size of each chunk during the download process. This is useful for handling large files efficiently and for providing a consistent experience even with unstable network conditions.

Recommendations: Choose a chunk size based on your available memory and network stability. Larger chunks may improve download speed but require more memory, while smaller chunks provide better fault tolerance and reduce memory usage.

The appropriate chunk size can vary based on file sizes and the network’s reliability. Larger chunks may be beneficial in high-bandwidth environments, while smaller chunks help in environments with frequent network disruptions.

Example:

If you set the chunk size to 1 MB (1,048,576 bytes), a 500 MB file will be downloaded in 500 separate chunks.

These settings allow for more control over how files are uploaded to and downloaded from Amazon S3, ensuring that large files can be handled efficiently and reliably.

Decryption

When S3 Connection is used as a source (FROM) in the source-to-destination transformation, it is possible to configure the automatic decryption of the encrypted source files using the PGP algorithm and private key uploaded to the secure key storage.

If the private key is available, all source files processed by the transformation will be automatically decrypted using the PGP algorithm and given key. Note that the private key requires a password.

Read how to generate a pair of public/private keys.

Expected Compression

The Expected Compression setting allows you to specify the compression format expected when reading individual files from a connection. Supported options include No Compression, Zip, and GZip.

If set to Zip or GZip, Etlworks will automatically decompress each resource as it’s read. This setting assumes that each compressed file contains a single resource; it does not support archives with multiple files. Use this setting with caution, as the system will always attempt to decompress the resource based on the selected format, regardless of its file extension or actual content.

Work with region-specific S3 buckets

When creating an S3 connection select the correct region. Read about available AWS regions.

Work with Multi-Region S3 Buckets (MRAPs)

Etlworks fully supports Amazon S3 Multi-Region Access Points (MRAPs), allowing you to route S3 operations through a single global endpoint with automatic region-aware access.

To use a Multi-Region Access Point in the S3 connector:

Create the MRAP in AWS:
- In the S3 Console, go to Multi-Region Access Points
- Click Create Multi-Region Access Point
- Choose two or more existing buckets in different regions
- Give your MRAP a name (e.g., global-access)
- AWS will assign it a unique alias (e.g., myxyzx7bn8rdn.mrap)
Configure the connector in Etlworks:
- In the Bucket or MRPA field, enter the MRAP alias (e.g., myxyzx7bn8rdn.mrap)
- In the AWS Account ID field, enter the 12-digit AWS Account ID (e.g., 123456789559)
- Etlworks automatically reconstructs the full ARN:
  
  arn:aws:s3::[account-id]:accesspoint/[alias]
Region setting:
- You can keep the AWS Region set to a standard region (e.g., us-east-1)
- The SDK will automatically route traffic using the MRAP via Global Accelerator

This configuration allows you to build region-agnostic, failover-resilient S3 workflows across multiple AWS regions with no manual routing logic.

Add metadata when creating files in S3 (and Google Cloud Storage)

Overview

You can use the metadata to implement server-side encryption or simply user-defined fields.

To add metadata to the files created in S3, use HTTP headers when configuring your S3 Connection.

Note: all Amazon S3 headers have the prefix x-amz-, even if you don't set them.

Articles in this section

About S3

When to use this connector

Creating a connection

Connection parameters

Common parameters

Authentication

Option 1: Access Key / Secret

Option 2: IAM Role (AssumeRole)

Cross-account setup: Etlworks-hosted instance, your AWS account

Option 3: Instance profile (no credentials)

Quick reference

Other parameters

Working with Multi-Region S3 Buckets (MRAPs)

Auto-retry

Chunked download and upload

Decryption

Expected Compression

Work with region-specific S3 buckets

Work with Multi-Region S3 Buckets (MRAPs)

Add metadata when creating files in S3 (and Google Cloud Storage)

Overview

Related articles