Overview
Amazon Redshift is a fast, fully-managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. Read more about Amazon Redshift. Etlworks includes several flows optimized for Amazon Redshift.
Flows optimized for Redshift
Flow type | When to use |
|
When you need to extract data from any source, transform it and load it into Redshift. |
Bulk load files in S3 into Redshift | When you need to bulk-load files that already exist in S3 without applying any transformations. The flow automatically generates the COPY command and MERGEs data into the destination. |
Stream CDC events into Redshift | When you need to stream updates from the database which supports Change Data Capture (CDC) into Redshift in real-time. |
Stream messages from queue into Redshift | When you need to stream messages from the queue which supports streaming into Redshift in real-time. |
COPY files into Redshift | When you need to bulk-load data from the file-based or cloud storage, API, or NoSQL database into Redshift without applying any transformations. This flow requires providing the user-defined COPY command. UnlikeBulk load files in S3 into Redshift, this flow does not support automatic MERGE. |
Videos
ETL and CDC data into Redshift Watch how to create flows to ETL and CDC data into Amazon Redshift |
Related resources
Reverse ETL with Amazon Redshift You can use any |
ELT with Amazon Redshift Etlworks supports executing complex ELT scripts directly in Redshift, which greatly improves the performance and reliability of the data ingestion. |
Data type Mapping for Redshift It is important to understand how we map various JDBC data types for the Redshift data types.
|
Configure Redshift Configure permissions and firewall. |
Connect to Redshift Create a connection for the Redshift cluster. |
Load multiple tables by a wildcard name You can ETL data from multiple database objects (tables and views) into Redshift by a wildcard name without creating individual source-to-destination transformations. |
Setup Change Replication using a high watermark (HWM) Using HWM replication you can load only new and updated records into Redshift. |
|
|
Related case study
Professional social network Load data into Amazon Redshift from multiple sources
|
"A typical CDC Flow can extract data from multiple tables in multiple databases, but having a single Flow pulling data from 55000+ tables would be a major bottleneck as it would be limited to a single blocking queue with a limited capacity. It would also create a single point of failure." |
Configure Redshift
Configure the Firewall
Typically, the TCP port 5439
is used to access Amazon Redshift. If Amazon Redshift and Eltworks are running on different networks, it is important to enable inbound traffic on port 5439
.
Learn how to open an Amazon Redshift port for inbound traffic.
Configure permissions for Redshift
- The user used to access the Redshift must have
INSERT
privilege for the Amazon Redshift table. - If you are loading data into Redshift from an Amazon S3 bucket, you will need to grant access to S3 for the Redshift user.
- For access to Amazon S3 with the ability to use
COPY
andUNLOAD
, choose either theAmazonS3ReadOnlyAccess
role or theAmazonS3FullAccess
role.
Connect to Redshift
To work with Redshift, you will need to create a Connection to Amazon Redshift.
Comments
0 comments
Please sign in to leave a comment.