When to use this Flow type
This Flow loads CSV files directly into Redshift. The files can be anywhere, so long as Etlworks Integrator can read them: server storage, FTP, FTPS, SFTP, cloud storage, etc.
This is just a bulk load. Files must already exist.
Step 1. Create a source Connection which can either be file storage, cloud storage, or HTTP Listener.
Step 2. Create and test the Redshift Connection as explained here.
Step 3. Create a Connection to the Amazon S3 bucket that will be used as a staging area, as explained here. When configuring the S3 Connection, select
GZip as the value for the
Archive file before copying to S3 field.
Step 4. Start creating the Redshift Flow as explained here.
Step 5. Select
Load files into Redshift.
Step 6. Continue by defining the source and destination.
Step 7. For the source
FROM, select the Connection created in Step 1 and enter a file name or a wildcard file name, for example,
Step 8. For the destination
TO, select the Connection created in Step 3.
Step 9. Select the Redshift Connection created in step 2, for the final destination, as explained here.
Step 10. Continue by defining the parameters. The only required parameter is
Step 11. Define the other parameters (the rest are optional):
- Action: if
Moveis selected (default), the file(s) will be copied to the S3 bucket and removed from the original source. If
Copyis selected, the file(s) will be copied to the S3 bucket and retain in the original source.
- Add Suffix to the Destination File Name: you can select one of the predefined suffixes for the files created using this file operation. For example, if you select
uuidas a suffix, and the original filename is
dest.csv, Etlworks Integrator will create files with the name
uuidis a globally unique identifier such as
21EC2020-3AEA-4069-A2DD-08002B30309D. The default value for this field is
- Do not process files that have been already processed: if this option is enabled, the system will skip files that have already been loaded into Redshift.
- Maximum Simultaneous Operations: Etlworks Integrator can copy or move each file in its own thread. Use this property to set the maximum number of simultaneous file operations.
- Purge File(s) if Error: if this option is enabled (by default), the staging files will be automatically deleted if the
COPYcommand was executed with an error.
- Name(s) of the File(s) to Purge: having
Purge File(s) if Errorenabled makes it possible to configure which files you want the system to delete if an error occurs while executing the
COPYcommand. Wildcard file names are allowed. If not specified, the system will use the same wildcard file name it used for the
- Before COPY INTO SQL: this SQL will be executed on the Redshift Connection
Before COPY INTO SQL.
- Ignore errors when executing Before COPY SQL: if this option is enabled, and there is an error when
Before COPY SQLis executed, the error will be ignored.
- After COPY SQL: this SQL will be executed on the Redshift Connection
After COPY SQL.
- Ignore errors when executing After COPY SQL: if this option is enabled, and there is an error when
After COPY SQLis executed, the error will be ignored.
Step 12. Save and run the Flow.