Snowflake is a columnar database, optimized for data retrieval operations, so loading data into Snowflake using DML statements (INSERT/UPDATE/DELETE) can be quite slow, especially for larger datasets.
While it is possible to load data in Snowflake using regular flows, such as database-to-database, it is highly recommend to use Snowflake-optimized flow.
A typical Snowflake flow performs the following operations:
- Extracts data from the source.
- Creates CSV, JSON. Avro, or Parquet files.
- Compresses files using the gzip algorithm.
- Copies files into Snowflake area (local file system, Azure Blob, or Amazon S3).
- Checks to see if the destination Snowflake table exists, and if it does not - creates the table using metadata from the source.
- Dynamically generates and executes the Snowflake COPY INTO command.
- Cleans up the remaining files, if needed.
Read how to efficiently load large datasets into Snowflake with Etlworks.
Comments
0 comments
Please sign in to leave a comment.