Redshift is a columnar database, optimized for data retrieval operations, so loading data into Redshift using DML statements (INSERT/UPDATE/DELETE) can be quite slow, especially for larger datasets.
While it is possible to load data in Redshift using regular flows, such as database-to-database, it is highly recommend to use Redshift-optimized flow.
A typical Redshift flow performs the following operations:
- Extracts data from the source.
- Creates CSV files.
- Compresses files using the gzip algorithm.
- Copies files into Amazon S3 bucket.
- Checks to see if the destination Redshift table exists, and if it does not - creates the table using metadata from the source.
- Dynamically generates and executes the Redshift COPY command.
- Cleans up the remaining files, if needed.
Read how to efficiently load large datasets into Redshift with Etlworks.
Comments
0 comments
Please sign in to leave a comment.