Bulk load files into Snowflake – Etlworks Support

If your files are already in a stage that Snowflake can read, and you don't need any transformation, use this flow. Etlworks reads the file list, optionally creates the destination table, generates the COPY INTO command, and (optionally) MERGEs the data.

When to use this flow

You want to load files (CSV, JSON, Avro, Parquet) that already exist in server storage or a Snowflake external stage (S3, Azure Blob, GCS) — no transformation needed, no over-the-network transfer.
You want to load files produced by a CDC flow into Snowflake.

For a flow that also extracts and transforms data on the way in, use Extract, transform, and load data in Snowflake instead.

How does this flow work?

Reads the names of all files matching the wildcard in the stage. Can recurse into subfolders.
Derives destination table names from the source file names.
Spawns one thread per destination table (up to a configurable limit) and loads them in parallel.
For each table: loads the staged files into a temporary table via COPY INTO, then merges into the actual destination table.

Alternatively, the flow can skip the temp-table step and load directly into the actual destination table — MERGE and CDC MERGE aren't available in direct-load mode.

Features

Supports COPY INTO (INSERT), MERGE (UPSERT), and CDC MERGE. CDC MERGE applies INSERT / UPDATE / DELETE events in the exact order they were captured from the source database.

Feature	What it does
Monitor source schema changes	Auto-CREATE and ALTER destination tables as source schemas evolve.
Load semi-structured files into a VARIANT column	For JSON, Avro, and Parquet sources, the flow can load the entire document into a single VARIANT column on the destination table.
Load directly into the destination table	Skip the temporary-table step. Faster, but MERGE / CDC MERGE aren't available.
Delete loaded source files	Remove successfully loaded source files after the load.
Load data in parallel	Multiple destination tables load in parallel threads.
Process files in subfolders	Walk nested subdirectories during the file scan.

What do I need before I start?

An active Snowflake data warehouse.
A stage name set on the Snowflake connection or on the transformation. Etlworks can create the named stage automatically (except for GCS, which must be created manually).
For external stages, the bucket / container already exists. Etlworks doesn't create cloud storage locations.

Step-by-step setup

Step 1. Create the stage connection

Pick the stage that matches your Snowflake setup:

Server storage — for the internal Snowflake stage.
Amazon S3 — for an S3 external stage.
Azure Storage — for an Azure Blob external stage.
Google Cloud Storage — for a GCS external stage.

Step 2. Create the Snowflake connection

See the connector reference for full setup.

Step 3. Create the format

For most cases a CSV format is fine. For semi-structured data, use a JSON, Avro, or Parquet format. Etlworks can load semi-structured files into a single VARIANT column on the Snowflake side — see Load semi-structured files into the VARIANT column below.

Step 4. Create the flow

Open Flows, click Add flow, type bulk load snowflake in the gallery search, and select the bulk-load flow type.

Step 5. Configure the load transformation

Add a source-to-destination transformation. Source connection = the stage from Step 1; destination connection = the Snowflake connection from Step 2.

FROM: the wildcard pattern for the staged files (e.g. folder/*.csv or folder/**/*.csv for nested folders).

TO: the destination table pattern. Use a wildcard like schema.* to let Etlworks derive the table name from each file.

Step 6. Set parameters

Source files and destination tables

Parameter	What it does
Calculate Snowflake table name	How the destination table name is derived from each source filename (none / strip extension / strip extension and timestamp / custom JavaScript).
Include files in subfolders	Recurse into nested directories during the file scan.
Maximum number of files to process	Hard cap on how many files are processed per flow run.
Maximum number of parallel threads	How many destination tables load concurrently. Default 5; max 99.

Load and MERGE

Parameter	What it does
Action	COPY INTO inserts new rows; MERGE upserts on lookup fields; CDC MERGE applies CDC change events.
Lookup Fields	Comma-separated list of columns that uniquely identify a record. Required for MERGE and CDC MERGE. Can also be specified per-table as semicolon-separated fully_qualified_table=field_list pairs.
Predict Lookup Fields	Auto-detect the unique-key columns when Lookup Fields is empty.
Load directly into destination table	Skips the temp-table step. MERGE / CDC MERGE not available in this mode.

ELT — SQL before and after the load

Parameter	What it does
Before COPY INTO SQL / Ignore errors / is a script	SQL to run on the Snowflake connection before the load.
After COPY INTO SQL / Ignore errors / is a script	Same, but after the load.

Handling source schema changes

Option	What it does
Alter target table if source has extra columns	Adds missing columns to the destination table.
Recreate target table if source has extra columns	Drops and recreates the destination. Destructive.
Insert NULL into target fields not in source	Inserts NULL for any target column the source file doesn't have.
Ignore fields in source not in target	Drops source columns the destination doesn't have.

Debug and error recovery

Parameter	What it does
Log each executed SQL statement	Logs every generated and executed SQL statement.
On Error	Behavior on a per-file or per-statement error: stop, continue, or skip the file (warehouse-specific options).
Purge File(s) if Error	On by default. Deletes staged files on load failure.

Step 7. Optionally configure mapping

If you need explicit field-to-field mapping, set it up in Mapping. Auto-mapping by field name works for most flows.

Step 8. Optionally add more transformations

You can add multiple source-to-destination transformations to a single flow to load different file patterns into different table sets.

Handling processed files

Three options for what to do with source files once they've loaded successfully.

Delete processed files

Enable Delete loaded source files. The flow removes each source file as soon as its load completes successfully. Useful for one-shot loads.

Skip already-processed files

Enable Do not process files that have been already processed. Etlworks tracks loaded filenames internally and skips files that have already been processed. Works well for incremental ingest where source files are appended over time.

Move processed files

Set Move files after load and pick a target folder. After a successful load, the source files are moved (rather than deleted), keeping an archive.

Troubleshooting

See Common issues when loading data into cloud data warehouses for common errors and fixes.

Articles in this section