This flow bulk-loads data into Snowflake using a COPY INTO command you write yourself. It copies files from the source to a Snowflake internal or external stage and then runs your COPY INTO. Sources can be local or cloud storage, API responses, email attachments, or objects in a NoSQL database.
Compared to the auto-generated alternative: use the Bulk load files into Snowflake flow when you want Etlworks to generate the COPY INTO for you (and you want MERGE support). Use this flow when you need full control over the SQL.
How does this flow work?
- Copies files from the source to the Snowflake stage (internal or external).
- Executes your user-defined COPY INTO command against Snowflake.
- Optionally cleans up staged files.
Note: Unlike the auto-generated bulk-load flow, this flow does not support automatic MERGE.
What do I need before I start?
- An active Snowflake data warehouse.
- A stage name set on the Snowflake connection or on the transformation (transformation overrides the connection). Etlworks can create the named stage automatically.
- If you use an external stage (S3, Azure Blob, GCS), the bucket / container must already exist. Etlworks doesn't create cloud storage locations for you.
Step-by-step setup
Step 1. Create a source connection
Pick whichever connector matches the location of the files you'll load.
Step 2. Create and test the Snowflake connection
See configuring the Snowflake connection. Set the Stage name. For loading files from cloud storage, the named external stage must point at the bucket / container configured for the cloud storage connection in Step 3.
Step 3. Create the stage destination connection
Pick the connection type that matches where Snowflake will read files from:
- Server storage — for loading from the internal Snowflake stage.
- Amazon S3 — for loading from an S3 external stage.
- Azure Storage — for loading from an Azure external stage.
- Google Cloud Storage — for loading from a GCS external stage.
For cloud-storage stages, set Archive file before copying to GZip.
Step 4. Create the flow
Open Flows, click Add flow, type Snowflake in the gallery search, and select COPY files into Snowflake.
Step 5. Set source (FROM) and destination (TO)
FROM: pick the source connection from Step 1 and enter a file name or wildcard pattern (e.g. *.csv).
TO: pick the stage destination connection from Step 3 and enter the base destination table name (without database or schema).
Step 6. Set the Snowflake connection on the flow
Open the Connections tab and pick the Snowflake connection from Step 2.
Step 7. Configure the COPY INTO SQL
Click MAPPING on the transformation row, go to the Parameters tab, and fill in Copy INTO SQL. See the Snowflake COPY INTO reference for the full syntax.
CSV source example
COPY INTO TABLE_NAME FROM @stage_name PATTERN = 'destination_name.*'
FILE_FORMAT = (type = 'CSV', FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '"'
SKIP_HEADER = 1 NULL_IF = ('NULL','null',''))
PURGE = true FORCE = true TRUNCATECOLUMNS = false ON_ERROR = ABORT_STATEMENT
- TABLE_NAME — the destination Snowflake table. Must be created manually.
- stage_name — the Snowflake stage name (matches the one on the Snowflake connection).
- destination_name — the name entered in TO.
JSON source example
COPY INTO TABLE_NAME FROM @stage_name PATTERN = 'destination_name.*' FILE_FORMAT = (type = 'JSON', strip_outer_array = true) PURGE = true FORCE = true ON_ERROR = ABORT_STATEMENT
- TABLE_NAME — the destination Snowflake table. Must be created manually with a single VARIANT column: create table table_name (data_column_name variant).
- stage_name — the Snowflake stage name.
- destination_name — the name entered in TO.
Step 8. Set optional parameters
| Parameter | What it does |
|---|---|
| Action | Move (default): copy to the stage and delete from the source. Copy: copy to the stage and keep the source file. |
| Add Suffix to the Destination File Name | Adds a suffix to the staged filename. Default is uuid — a globally unique identifier (e.g. 21EC2020-3AEA-4069-A2DD-08002B30309D) is appended, so dest.csv becomes dest_uuid.csv. |
| Do not process files that have been already processed | Skips files Etlworks has already loaded. Snowflake also tracks this internally, but with this option on, the file isn't even copied to the stage — saving bandwidth and storage cost. |
| Maximum Simultaneous Operations | How many file copy/move operations run in parallel. |
| Purge File(s) if Error | On by default. Deletes staged files if COPY INTO fails. |
| Name(s) of the File(s) to Purge | Wildcard pattern for files to delete on error. Defaults to the pattern used in COPY INTO. |
| Before COPY INTO SQL / Ignore errors | SQL to run on the Snowflake connection before COPY INTO, with optional error suppression. |
| After COPY INTO SQL / Ignore errors | Same, but after COPY INTO. |
Step 9. Save and run the flow
Wildcard filenames in the COPY INTO command
Snowflake's COPY INTO PATTERN uses a small wildcard syntax:
- .*string.* — matches files containing string. E.g. abc_string, abc1_string23, string_abc.
- .* — matches zero or more of any character.
- ? — matches exactly one of any character.
- [...] bracketed characters — matches any one character inside the brackets. Mix letters, digits, and other characters as needed.
Full reference: pattern matching in COPY INTO.