Extract, transform, and load data in Snowflake – Etlworks Support

Use this flow to extract data from any supported source, transform it, and load it into Snowflake at high performance. Etlworks generates and runs the Snowflake COPY INTO command for you, optionally followed by a MERGE.

Which Snowflake flow should I use?

Etlworks ships four Snowflake-optimized flow types. Pick the one that matches your source.

Flow	Use when
Any to Snowflake (this article) Database / File / Queue / Web service / Well-known API → Snowflake	You need to extract from any source, optionally transform, and load into Snowflake.
Bulk load files into Snowflake	The files already exist in an external Snowflake stage (S3, Azure Blob, GCS) or server storage. No transformation needed. Auto-generates COPY INTO; supports MERGE.
Stream CDC events into Snowflake	You need real-time replication from a CDC-enabled source database.
Stream messages from a queue into Snowflake	You need real-time ingestion from a message queue that supports streaming.
COPY files into Snowflake	You have a user-defined COPY INTO command and want Etlworks to run it. Does not support automatic MERGE.

Snowflake is a column-based relational database. Row-by-row inserts are very slow. The flows above use COPY INTO via a stage, which is the supported high-performance load path.

How does this flow work?

An Any-to-Snowflake flow executes these steps for each transformation:

Creates a named Snowflake stage if it doesn't exist.
Extracts data from the source.
Writes the data to CSV, JSON, Avro, or Parquet files.
Compresses the files with gzip.
Uploads the files into the Snowflake stage (server storage, S3, Azure Blob, or GCS).
Creates the destination Snowflake table from the source metadata if it doesn't exist.
Generates and runs the Snowflake COPY INTO command.
If MERGE is configured, merges the staged data with the existing target table.
Cleans up staged files.

What do I need before I start?

An active Snowflake data warehouse.
A stage name, configured either on the Snowflake connection or on the transformation (the transformation setting wins). Etlworks loads via COPY INTO, which requires a named internal or external stage. Etlworks can create the stage automatically.
If you use an external stage (S3, Azure Blob, GCS), the bucket or blob container must already exist. Etlworks does not create the storage location for you.

Stage configuration

Step-by-step setup

Step 1. Create connections

You need three connections:

A source connection (database, file storage, web service, queue, or well-known API).
A stage connection: Amazon S3, Azure Storage, Google Cloud Storage, or server storage. Set Archive file before copying to GZip for S3 / Azure / GCS stages.
A Snowflake connection. Set the Stage name here (or on the transformation later).

Stage connection settings

Snowflake connection settings

Step 2. Create a data exchange format

Snowflake can load data from CSV, JSON, or Avro. Create one of these and use it as the destination format.

For large CSV datasets, split the document into smaller files in the format settings. Snowflake loads files in parallel, and smaller files transfer faster.

Step 3. Create the flow

Open Flows, click +, type Snowflake in the gallery search.
Pick the flow type that matches your source (for example, Database to Snowflake).

Flow gallery search for Snowflake

Step 4. Set the Snowflake connection on the flow

Open the Connections tab.
Pick the Snowflake connection from Step 1.

Flow Connections tab

Step 5. Set source (FROM) and destination (TO)

FROM and TO fields

FROM (varies by flow type):

Database: table or view name. Wildcards supported, e.g. public.*.
Well-known API: endpoint or file name.
Web service: endpoint name.
File: file name or wildcard pattern.
Queue: queue or topic name.

TO: the fully qualified Snowflake table name. If FROM uses a wildcard, use a matching wildcard in TO.

The intermediate stage connection is the destination connection on the transformation. Etlworks treats it as the Snowflake stage.

Step 6. Set the data exchange format

Select the format you created in Step 2 as the destination format. Etlworks uses it to write staging files.

Destination format

Step 7. Set parameters (optional)

Click Configure on the transformation row, then go to the Parameters tab. The parameters are grouped below by purpose.

COPY INTO behavior

Parameter	What it does
Snowflake Table Name	Overrides the destination table name set at the transformation level.
Stage Name	Overrides the stage name set on the Snowflake connection. Becomes the @STAGE in the generated COPY INTO. Example: COPY INTO table FROM @{STAGING_AREA_NAME} PATTERN = 'file' FILE_FORMAT = (FORMAT_NAME = CSVFORMAT) PURGE = true.
Action	COPY INTO inserts new rows; MERGE upserts based on lookup fields.
How to MERGE	Snowflake MERGE (default) uses native Snowflake MERGE SQL. DELETE/INSERT deletes matching rows in the target then inserts everything from the temp table.
Lookup Fields	Comma-separated list of columns that uniquely identify a record. Required when Action is MERGE.
Predict Lookup Fields	Auto-detect the unique-key columns when Lookup Fields is empty. Detection may not always be accurate.
Use INSERT/DELETE instead of MERGE for CDC MERGE	Off by default. When on, the CDC MERGE step issues three SQL statements: (1) delete matching rows from the main table, (2) insert updated and new rows from the temp table, (3) delete rows marked for deletion in the temp table.
Load SQL	User-defined COPY INTO SQL. Etlworks generates one automatically by default. Example: COPY INTO {TABLE} FROM @STAGING PATTERN = '{FILE}' FILE_FORMAT = (FORMAT_NAME = {FORMAT}) PURGE = {PURGE}.
MERGE SQL	User-defined MERGE SQL. Available tokens: {TABLE}, {TEMP_TABLE}, {KEY_FIELDS}, {FIELDS}.
CDC MERGE SQL	User-defined CDC-MERGE SQL. Available tokens: {TABLE}, {TEMP_TABLE}, {MERGE_CONDITION}, {KEY_FIELDS}, {FIELDS}.
Force Loading Files	Adds FORCE=true to COPY INTO, reloading files even if they haven't changed.

File format and parsing

Parameter	What it does
File Format	Format name or inline definition, e.g. (type = 'CSV' field_delimiter = ',' skip_header = 1). If empty, Etlworks generates one from the destination format.
Replace Invalid Characters	Replaces invalid UTF-8 bytes with the Unicode replacement character instead of failing.
Error on Column Mismatch	If off, the load continues even when source and target column counts differ: extra source fields are ignored; missing source fields become NULL.
Trim Space	Strips leading and trailing whitespace from string fields.
String Used for SQL NULL	Maps specific source strings to SQL NULL on load. Comma-separated, parenthesized, e.g. NULL_IF = ('N', 'NULL', 'NUL', '').

Upload performance (local stage only)

Parameter	What it does
Number of Threads for Uploading Files	1 (no parallelism) to 99 (max). Files under 16 MB are staged in parallel as-is; larger files are split, uploaded concurrently, and reassembled in the stage. Higher thread counts help on large files.

ELT — SQL before and after COPY INTO

Parameter	What it does
Before COPY INTO SQL	SQL to run on the Snowflake connection before the COPY INTO.
Ignore Errors for Before COPY INTO SQL	Lets the flow continue if the before-SQL fails.
Before COPY INTO SQL is a Script	Treats the field as a full Snowflake SQL script, not a single statement.
After COPY INTO SQL / Ignore Errors / is a Script	Same three options, but run after the COPY INTO.

Error and file cleanup

Parameter	What it does
Truncate Columns	Adds TRUNCATECOLUMNS = true to COPY INTO, truncating strings to fit target column length. Off by default.
Purge File if Success	On by default (when not using user-defined SQL). Deletes staged files after a successful COPY INTO.
Purge File(s) if Error	On by default. Deletes staged files on COPY INTO failure.
Name(s) of the File(s) to Purge	Files to delete on error. Wildcards supported. Defaults to the same wildcard used in COPY INTO.
On Error	ABORT_STATEMENT (default) stops on error. CONTINUE ignores the error and continues. SKIP_FILE skips the failing file and continues with the next.

Debugging

Parameter	What it does
Log Each Executed SQL Statement	Logs every auto-generated and executed SQL statement, including before-/after-SQL.

Source-specific parameters

Some parameters only apply to certain source types:

Database source: see Extract, Transform, and Load (ETL) data for source-database options.
File or web-service source:
- Ignore Transformations: copies files directly to the stage without applying transformations. Much faster, but you lose any transformation steps.
- Delete Loaded Source Files: deletes the source file after a successful load.
- Delete Source Files on Error: deletes the source file if the load fails.
Well-known API source: see the well-known API options.

Step 8. Configure MERGE (UPSERT) (optional)

To upsert instead of insert:

Set Action to MERGE.
Provide Lookup Fields — a comma-separated list of columns that uniquely identify a record. Or enable Predict Lookup Fields for auto-detection (less reliable).

For wildcard sources (Step 9 below) that target multiple tables, specify per-table lookup fields with fully qualified names, separated by semicolons:

test1.inventory=inventory_id,database_name;
test1.payment=payment_id,database_name;
test1.rental=rental_id,database_name;

Snowflake MERGE configuration

Step 9. Handle source / destination schema differences

The source and the Snowflake target often have a different column count or order. Snowflake's COPY INTO fails when columns don't line up. Etlworks gives you several ways to deal with this:

Option	What it does
Reorder CSV Columns During Extract	Reorders columns in the extracted file to match the target. Ignored if the target table doesn't exist.
Reorder CSV Columns During Load (recommended)	Reorders columns inside the COPY INTO step. Works even when the file has different columns or order than the target. Prefer this over the during-extract option.
Ignore Fields in Source Not in Target	Drops source columns the target doesn't have. Mutually exclusive with the next two.
Alter Target Table If Source Has Extra Columns	Adds the extra columns to the Snowflake table. Mutually exclusive with the previous and next.
Recreate Target Table If Source Has Extra Columns	Drops and recreates the target table. Destructive. Mutually exclusive with the previous two.
Insert NULL Into Target Fields Not in Source	Inserts NULL for any target column the source doesn't have.
Create Table SQL	Custom SQL or JavaScript template for creating the target table. See overriding the create-table SQL.

Notifications on schema mismatch

If the source and target columns don't match, the flow logs an exception. You can wire that exception into an email notification — see sending notifications about failed transformations.

Step 10. Configure mapping (optional)

If you need explicit field-to-field mapping, set it up in Mapping. Mapping is not required; auto-mapping by field name works for most flows.

Snowflake has restrictions on field names — if the source contains characters Snowflake doesn't accept, COPY INTO fails. Map or rename those fields before loading.

Incremental change replication (high watermark)

Like any other ETL flow, this flow supports change replication using a high watermark (HWM). Only records that changed since the last run are loaded.

To enable it:

Set the High Watermark Field — the column that tracks changes.
Enable Change Replication on the transformation.
Optional: set a calculated HWM value for clearer tracking.

HWM configuration

Load multiple tables with a wildcard

You can load many database objects in a single transformation by using a wildcard in FROM.

FROM

Create a single source-to-destination transformation.
Set FROM to a fully qualified source object name with wildcards, e.g. public.*.

By default every table and view matching the pattern is included.

TO

Set TO to a Snowflake schema with a wildcard, e.g. SNOWFLAKE_DW.SCHEMA.* (or UTIL_DB.PUBLIC.*).

Include / exclude objects

Exclude Objects: comma-separated list of objects to skip. Use all views to skip every view; all tables to skip every table.
Include Objects: comma-separated list of objects to keep (everything else is skipped).

Source query, before/after SQL

Use the {TABLE} token in Source Query, Before COPY INTO SQL, and After COPY INTO SQL to reference each table dynamically.

MERGE with wildcards

When MERGE is the action:

Leave Lookup Field empty and enable Predict Lookup Fields, or
Specify per-table lookup fields as semicolon-separated fully_qualified_table=field_list pairs:

test1.inventory=inventory_id,database_name;
test1.payment=payment_id,database_name;
test1.rental=rental_id,database_name;

HWM with wildcards

For incremental replication with a calculated HWM field, use the {TABLE} token in the High Watermark Field Value.

Wildcard HWM configuration

Troubleshooting

See Common issues when loading data into cloud data warehouses for common errors and fixes.

Articles in this section