Load files in Snowflake using user-defined COPY INTO command – Etlworks Support

This flow bulk-loads data into Snowflake using a COPY INTO command you write yourself. It copies files from the source to a Snowflake internal or external stage and then runs your COPY INTO. Sources can be local or cloud storage, API responses, email attachments, or objects in a NoSQL database.

Compared to the auto-generated alternative: use the Bulk load files into Snowflake flow when you want Etlworks to generate the COPY INTO for you (and you want MERGE support). Use this flow when you need full control over the SQL.

How does this flow work?

Copies files from the source to the Snowflake stage (internal or external).
Executes your user-defined COPY INTO command against Snowflake.
Optionally cleans up staged files.

Note: Unlike the auto-generated bulk-load flow, this flow does not support automatic MERGE.

What do I need before I start?

An active Snowflake data warehouse.
A stage name set on the Snowflake connection or on the transformation (transformation overrides the connection). Etlworks can create the named stage automatically.
If you use an external stage (S3, Azure Blob, GCS), the bucket / container must already exist. Etlworks doesn't create cloud storage locations for you.

Stage configuration

Step-by-step setup

Step 1. Create a source connection

Pick whichever connector matches the location of the files you'll load.

Step 2. Create and test the Snowflake connection

See configuring the Snowflake connection. Set the Stage name. For loading files from cloud storage, the named external stage must point at the bucket / container configured for the cloud storage connection in Step 3.

Step 3. Create the stage destination connection

Pick the connection type that matches where Snowflake will read files from:

Server storage — for loading from the internal Snowflake stage.
Amazon S3 — for loading from an S3 external stage.
Azure Storage — for loading from an Azure external stage.
Google Cloud Storage — for loading from a GCS external stage.

For cloud-storage stages, set Archive file before copying to GZip.

Step 4. Create the flow

Open Flows, click Add flow, type Snowflake in the gallery search, and select COPY files into Snowflake.

COPY into Snowflake flow gallery

Step 5. Set source (FROM) and destination (TO)

FROM: pick the source connection from Step 1 and enter a file name or wildcard pattern (e.g. *.csv).

TO: pick the stage destination connection from Step 3 and enter the base destination table name (without database or schema).

FROM and TO

Step 6. Set the Snowflake connection on the flow

Open the Connections tab and pick the Snowflake connection from Step 2.

Flow Connections tab

Step 7. Configure the COPY INTO SQL

Click MAPPING on the transformation row, go to the Parameters tab, and fill in Copy INTO SQL. See the Snowflake COPY INTO reference for the full syntax.

Parameters tab

CSV source example

COPY INTO TABLE_NAME FROM @stage_name PATTERN = 'destination_name.*'
FILE_FORMAT = (type = 'CSV', FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '"'
SKIP_HEADER = 1 NULL_IF = ('NULL','null',''))
PURGE = true FORCE = true TRUNCATECOLUMNS = false ON_ERROR = ABORT_STATEMENT

TABLE_NAME — the destination Snowflake table. Must be created manually.
stage_name — the Snowflake stage name (matches the one on the Snowflake connection).
destination_name — the name entered in TO.

JSON source example

COPY INTO TABLE_NAME FROM @stage_name PATTERN = 'destination_name.*'
FILE_FORMAT = (type = 'JSON', strip_outer_array = true)
PURGE = true FORCE = true ON_ERROR = ABORT_STATEMENT

TABLE_NAME — the destination Snowflake table. Must be created manually with a single VARIANT column: create table table_name (data_column_name variant).
stage_name — the Snowflake stage name.
destination_name — the name entered in TO.

JSON COPY INTO example

Step 8. Set optional parameters

Parameter	What it does
Action	Move (default): copy to the stage and delete from the source. Copy: copy to the stage and keep the source file.
Add Suffix to the Destination File Name	Adds a suffix to the staged filename. Default is uuid — a globally unique identifier (e.g. 21EC2020-3AEA-4069-A2DD-08002B30309D) is appended, so dest.csv becomes dest_uuid.csv.
Do not process files that have been already processed	Skips files Etlworks has already loaded. Snowflake also tracks this internally, but with this option on, the file isn't even copied to the stage — saving bandwidth and storage cost.
Maximum Simultaneous Operations	How many file copy/move operations run in parallel.
Purge File(s) if Error	On by default. Deletes staged files if COPY INTO fails.
Name(s) of the File(s) to Purge	Wildcard pattern for files to delete on error. Defaults to the pattern used in COPY INTO.
Before COPY INTO SQL / Ignore errors	SQL to run on the Snowflake connection before COPY INTO, with optional error suppression.
After COPY INTO SQL / Ignore errors	Same, but after COPY INTO.

Step 9. Save and run the flow

Wildcard filenames in the COPY INTO command

Snowflake's COPY INTO PATTERN uses a small wildcard syntax:

.*string.* — matches files containing string. E.g. abc_string, abc1_string23, string_abc.
.* — matches zero or more of any character.
? — matches exactly one of any character.
[...] bracketed characters — matches any one character inside the brackets. Mix letters, digits, and other characters as needed.

Full reference: pattern matching in COPY INTO.

Articles in this section