Redshift is a columnar database, optimized for data retrieval operations, so loading data into it using DML statements (INSERT/UPDATE/DELETE) can be quite slow, especially for larger datasets.
While it is possible to load data in Redshift using regular Flows, such as Database to database
, it is highly recommended to use Redshift-optimized Flow.
A typical Redshift Flow performs the following operations:
- Extracts data from the source.
- Creates CSV files.
- Compresses files using the gzip algorithm.
- Copies files into Amazon S3 bucket.
- Checks to see if the destination Redshift table exists, and if it does not, creates the table using metadata from the source.
- Dynamically generates and executes the Redshift COPY command.
- Cleans up the remaining files, if needed.
Read how to efficiently load large datasets into Redshift.
Comments
0 comments
Please sign in to leave a comment.