About S3
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
Etlworks S3 connector supports reading, creating, updating, renaming, and deleting objects in S3.
When to use this connector
- When working with files when the source or destination is S3.
- When loading data in Snowflake or Redshift.
- When streaming CDC events to the cloud storage.
Creating a connection
Step 1. In the Connections
window, click +
, type in s3
.
Step 2. Select Amazon S3 (SDK)
.
Step 3. Enter Connection parameters.
Connection parameters
Common parameters
AWS Region
: the AWS region. This parameter is only available for the SDK connector.Bucket
: the bucket name.Directory
: the directory under the bucket. This parameter is optional.Files
: the actual file name or a wildcard file name, for example,*.csv
.
Authentication
Etlworks S3 connector supports authentication with access/secret key and IAM role.
Access Key or IAM Role
: the AWS access key or IAM role name.Secret Access Key
: the AWS secret access key. Note: the secret access key must be empty if authenticating with the IAM role.External ID
: In abstract terms, the external ID allows the user that is assuming the role to assert the circumstances in which they are operating. It also provides a way for the account owner to permit the role to be assumed only under specific circumstances. The primary function of the external ID is to address and prevent the confused deputy problem.
If both authentication parameters are empty the connector will attempt to authenticate using the default profile configured for the EC2 instance running Etlworks.
Other parameters
Metadata
: S3 object metadata. You can set object metadata in Amazon S3 at the time you upload the object. Object metadata is a set of name-value pairs. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata. Metadata is ignored when the multipart upload is configured.Part Size For Multipart Upload (bytes)
: by entering a number greater than 5242880, you will enable multipart upload to S3. If nothing is entered (default), the multipart upload is disabled. The minimum part size is 5242880; meanwhile, the maximum is 5368709120. Multipart Uploads involve uploading an object's data in parts instead of all at once, which can give the following advantages:- objects larger than 5 GB can be stored.
- large files can be uploaded in smaller pieces to reduce the impact of transient uploading/networking errors.
- objects can be constructed from data that is uploaded over a period of time, when it may not all be available in advance.
Add Suffix When Creating Files In Transformation
: you can select one of the predefined suffixes for the files created using this Connection. For example, if you selectuuid
as a suffix and the original file name isdest.csv
, Etlworks Integrator will create files with the namedest_uuid.csv
, where uuid is a globally unique identifier such as21EC2020-3AEA-4069-A2DD-08002B30309D
.
This parameter works only when the file is created using source-to-destination-transformation. Read how to add a suffix to the files created when copying, moving, renaming, and zipping files.
Enable Wildcard File Name
: if there are multiple source files in a folder, you can specify an algorithm that will be used to select an actual source file to process. For example, if Files is set to*.csv
and theoldest
is selected, Etlworks Integrator will always select the oldest file in the folder which matches the wildcard*.csv
.Override Wildcard File Name set for Connection
: ifWildcard File Names
are allowed, the option toOverride Wildcard File Name set for Connection
is enabled, and the file name entered in theFROM
field of the transformation is a wildcard file name, the system will override the file name entered at the Connection level. The default behavior is to use the wildcard file name that was entered when the Connection was configured.Archive file before copying to
: Etlworks Integrator can archive files, using one of the supported algorithms (zip or gzip), before copying them to cloud storage. Since cloud storage is typically a paid service, it can save money and time if you choose to archive files.Contains CDC events:
When this parameter is enabled, the Etlworks Integrator adds standard wildcard templates for CDC files to the list of available sources in the FROM selector.
Decryption
When S3 Connection is used as a source (FROM
) in the source-to-destination transformation, it is possible to configure the automatic decryption of the encrypted source files using the PGP algorithm and private key uploaded to the secure key storage.
If the private key is available, all source files processed by the transformation will be automatically decrypted using the PGP algorithm and given key. Note that the private key requires a password.
Comments
0 comments
Please sign in to leave a comment.