Amazon Redshift is a fully managed cloud data warehouse. Etlworks ships several flow types optimized for loading and reading Redshift at high performance.
Which Redshift flow should I use?
| Flow | Use when |
|---|---|
| Any to Redshift (Database / File / Queue / Web service / Well-known API) | You need to extract from any source, optionally transform, and load into Redshift. |
| Bulk load files in S3 into Redshift | The files already exist in S3. No transformation needed. Auto-generates COPY; supports MERGE. |
| Stream CDC events into Redshift | You need real-time replication from a CDC-enabled source database. |
| Streaming with message queues | You need real-time ingestion from a message queue that supports streaming. |
| Load files in Redshift using user-defined COPY command | You have a user-defined COPY command and want Etlworks to run it. Does not support automatic MERGE. |
What do I need before I start?
- A running Redshift cluster reachable from your Etlworks instance.
- A Redshift user with INSERT privilege on the target tables.
- An S3 bucket that Redshift can read from (used as the stage). If loading from S3, the Redshift cluster needs permission to read the bucket — either an IAM role or access keys.
Configure Redshift
Open the firewall
Redshift listens on TCP port 5439 by default. If your Etlworks instance and Redshift run on different networks, enable inbound traffic on that port. See Authorize cluster access in the AWS docs.
Grant permissions
- The Redshift user used by Etlworks must have INSERT on the target tables.
- For loading from S3, grant Redshift access to the S3 bucket. See Authorizing Redshift to access other AWS services.
- Attach AmazonS3ReadOnlyAccess (or AmazonS3FullAccess if you also need UNLOAD) to the Redshift IAM role.
Connect to Redshift
- Open the Connections window and click +.
- Type redshift in the search field.
- Select the Amazon Redshift connection and fill in the connection parameters. Full reference: configuring the Redshift connection.
Also create an Amazon S3 connection to use as the Redshift stage.
Where to go next
| Topic | Article |
|---|---|
| Extract, transform, and load data into Redshift | Extract, transform, and load data in Amazon Redshift |
| Bulk-load existing S3 files | Bulk load files in S3 into Redshift |
| ELT — run transformation SQL directly in Redshift | ELT with Amazon Redshift |
| Reverse ETL — extract from Redshift into any destination | Reverse ETL with Amazon Redshift |
| Data type mapping (JDBC ↔ Redshift) | Data type mapping for Redshift |
| Load many tables at once | Load multiple tables with a wildcard |
| Incremental load (HWM) | Incremental change replication using high watermark |
| Troubleshooting | Common issues when loading data into cloud data warehouses |