This flow bulk-loads data into Amazon Redshift using a COPY command you write yourself. It copies files from the source to S3 and then runs your COPY. Sources can be local or cloud storage, API responses, email attachments, or objects in a NoSQL database.
Compared to the auto-generated alternative: use the Bulk load files in S3 into Redshift flow when you want Etlworks to generate the COPY for you (and you want MERGE support). Use this flow when you need full control over the SQL.
How does this flow work?
- Copies files from the source to S3.
- Executes your user-defined COPY command against Redshift.
- Optionally cleans up staged files.
Note: Unlike the auto-generated bulk-load flow, this flow does not support automatic MERGE.
What do I need before I start?
- Redshift is up and reachable from your Etlworks instance.
- The Redshift user has INSERT on the target tables.
- An S3 bucket exists and Redshift can read it.
Step-by-step setup
- Create a source connection.
- Create and test a Redshift connection.
- Create an S3 connection. Set Archive file before copying to GZip.
- Open Flows, click Add flow, type Redshift, and select Copy files into Redshift.
- Set FROM to the source connection + a filename or wildcard (e.g. *.csv).
- Set TO to the S3 connection + the base destination table name (no database or schema).
- Open the Connections tab and pick the Redshift connection.
- Click MAPPING, open Parameters, and fill in Copy SQL. See the Redshift COPY command reference for the full syntax.
Optional parameters
| Parameter | What it does |
|---|---|
| Action | Move (default): copy to S3 and delete from source. Copy: copy to S3 and keep the source file. |
| Add Suffix to the Destination File Name | Adds a suffix to the staged filename. Default uuid appends a globally unique identifier so dest.csv becomes dest_uuid.csv. |
| Do not process files that have been already processed | Skips files Etlworks has already loaded. |
| Maximum Simultaneous Operations | How many file copy/move operations run in parallel. |
| Purge File(s) if Error | On by default. Deletes staged files if COPY fails. |
| Name(s) of the File(s) to Purge | Wildcard for files to delete on error. Defaults to the pattern used in COPY. |
| Before COPY SQL / Ignore errors | SQL to run on Redshift before COPY, with optional error suppression. |
| After COPY SQL / Ignore errors | Same, but after COPY. |