Overview
Etlworks uses the Snowflake COPY INTO
command to load data into Snowflake tables. COPY INTO
requires a named internal or external stage. Stage refers to the location where your data files are stored for loading into Snowflake.
Etlworks flows optimized for Snowflake can automatically create an internal named stage
and external named stage
in AWS S3 and Azure Blob.
The flow cannot automatically create a stage in Google Cloud storage. You will need to create a stage in Google Cloud storage manually.
Process
Step 1. Set the Stage name
.
or in the source-to-destination Transformation when configuring the Flow which loads data in Snowflake:
The Stage name set at the transformation level overrides the Stage name set for the connection. Snowflake flow cannot be created if the Stage name is not set in either of these two places.
Step 2. Make sure that the parameterCheck if stage exists
set for the Snowflake connection is enabled. Note that it is enabled by default.
How it works
If Check if stage exists
is enabled the flow performs two extra steps before loading data into Snowflake:
- The flow checks if the stage with the given name and parameters exists. When checking parameters the flow verifies the type of the stage (internal, external), the location of the stage (bucket or blob), and the credentials used to create the stage.
- If the stage with the given name and parameters does not exist the flow automatically creates the stage.
We recommend keeping the Check if stage exists
enabled. The Snowflake COPY INTO
command will not fail if the stage exists but is not configured properly, for example not pointed to the right bucket. In this case, the flow will not load any data.
Handling errors and edge cases
Create a stage manually and slightly improve the performance of the load
You can slightly improve the performance of the load by creating the stage manually and disabling Check if stage exists
.
Working with stages in Google Cloud Storage
Etlworks flows, optimized for Snowflake cannot automatically create a stage in Google Cloud storage. You will need to create it manually.
Ignore errors when the Flow cannot create the stage
If Check if stage exists
is enabled and the stage with the given name and parameters does not exist the flow will attempt to create it. If an attempt to create the stage fails the flow fails as well. In some rare cases when the SQL for loading data is provided as a part of the flow (instead of using the automatically generated COPY INTO
command) you can configure the Snowflake connection to ignore errors generated when the flow cannot create the stage by disabling Fail load if flow cannot create stage
.
Create stage manually
Read how to manually create a named Snowflake stage:
Comments
0 comments
Please sign in to leave a comment.