Use this flow to extract data from any supported source, transform it, and load it into Snowflake at high performance. Etlworks generates and runs the Snowflake COPY INTO command for you, optionally followed by a MERGE.
Which Snowflake flow should I use?
Etlworks ships four Snowflake-optimized flow types. Pick the one that matches your source.
| Flow | Use when |
|---|---|
|
Any to Snowflake (this article) Database / File / Queue / Web service / Well-known API → Snowflake |
You need to extract from any source, optionally transform, and load into Snowflake. |
| Bulk load files into Snowflake | The files already exist in an external Snowflake stage (S3, Azure Blob, GCS) or server storage. No transformation needed. Auto-generates COPY INTO; supports MERGE. |
| Stream CDC events into Snowflake | You need real-time replication from a CDC-enabled source database. |
| Stream messages from a queue into Snowflake | You need real-time ingestion from a message queue that supports streaming. |
| COPY files into Snowflake | You have a user-defined COPY INTO command and want Etlworks to run it. Does not support automatic MERGE. |
Snowflake is a column-based relational database. Row-by-row inserts are very slow. The flows above use COPY INTO via a stage, which is the supported high-performance load path.
How does this flow work?
An Any-to-Snowflake flow executes these steps for each transformation:
- Creates a named Snowflake stage if it doesn't exist.
- Extracts data from the source.
- Writes the data to CSV, JSON, Avro, or Parquet files.
- Compresses the files with gzip.
- Uploads the files into the Snowflake stage (server storage, S3, Azure Blob, or GCS).
- Creates the destination Snowflake table from the source metadata if it doesn't exist.
- Generates and runs the Snowflake COPY INTO command.
- If MERGE is configured, merges the staged data with the existing target table.
- Cleans up staged files.
What do I need before I start?
- An active Snowflake data warehouse.
- A stage name, configured either on the Snowflake connection or on the transformation (the transformation setting wins). Etlworks loads via COPY INTO, which requires a named internal or external stage. Etlworks can create the stage automatically.
- If you use an external stage (S3, Azure Blob, GCS), the bucket or blob container must already exist. Etlworks does not create the storage location for you.
Step-by-step setup
Step 1. Create connections
You need three connections:
- A source connection (database, file storage, web service, queue, or well-known API).
- A stage connection: Amazon S3, Azure Storage, Google Cloud Storage, or server storage. Set Archive file before copying to GZip for S3 / Azure / GCS stages.
- A Snowflake connection. Set the Stage name here (or on the transformation later).
Step 2. Create a data exchange format
Snowflake can load data from CSV, JSON, or Avro. Create one of these and use it as the destination format.
For large CSV datasets, split the document into smaller files in the format settings. Snowflake loads files in parallel, and smaller files transfer faster.
Step 3. Create the flow
- Open Flows, click +, type Snowflake in the gallery search.
- Pick the flow type that matches your source (for example, Database to Snowflake).
Step 4. Set the Snowflake connection on the flow
- Open the Connections tab.
- Pick the Snowflake connection from Step 1.
Step 5. Set source (FROM) and destination (TO)
FROM (varies by flow type):
- Database: table or view name. Wildcards supported, e.g. public.*.
- Well-known API: endpoint or file name.
- Web service: endpoint name.
- File: file name or wildcard pattern.
- Queue: queue or topic name.
TO: the fully qualified Snowflake table name. If FROM uses a wildcard, use a matching wildcard in TO.
The intermediate stage connection is the destination connection on the transformation. Etlworks treats it as the Snowflake stage.
Step 6. Set the data exchange format
Select the format you created in Step 2 as the destination format. Etlworks uses it to write staging files.
Step 7. Set parameters (optional)
Click Configure on the transformation row, then go to the Parameters tab. The parameters are grouped below by purpose.
COPY INTO behavior
| Parameter | What it does |
|---|---|
| Snowflake Table Name | Overrides the destination table name set at the transformation level. |
| Stage Name | Overrides the stage name set on the Snowflake connection. Becomes the @STAGE in the generated COPY INTO. Example: COPY INTO table FROM @{STAGING_AREA_NAME} PATTERN = 'file' FILE_FORMAT = (FORMAT_NAME = CSVFORMAT) PURGE = true. |
| Action | COPY INTO inserts new rows; MERGE upserts based on lookup fields. |
| How to MERGE | Snowflake MERGE (default) uses native Snowflake MERGE SQL. DELETE/INSERT deletes matching rows in the target then inserts everything from the temp table. |
| Lookup Fields | Comma-separated list of columns that uniquely identify a record. Required when Action is MERGE. |
| Predict Lookup Fields | Auto-detect the unique-key columns when Lookup Fields is empty. Detection may not always be accurate. |
| Use INSERT/DELETE instead of MERGE for CDC MERGE | Off by default. When on, the CDC MERGE step issues three SQL statements: (1) delete matching rows from the main table, (2) insert updated and new rows from the temp table, (3) delete rows marked for deletion in the temp table. |
| Load SQL | User-defined COPY INTO SQL. Etlworks generates one automatically by default. Example: COPY INTO {TABLE} FROM @STAGING PATTERN = '{FILE}' FILE_FORMAT = (FORMAT_NAME = {FORMAT}) PURGE = {PURGE}. |
| MERGE SQL | User-defined MERGE SQL. Available tokens: {TABLE}, {TEMP_TABLE}, {KEY_FIELDS}, {FIELDS}. |
| CDC MERGE SQL | User-defined CDC-MERGE SQL. Available tokens: {TABLE}, {TEMP_TABLE}, {MERGE_CONDITION}, {KEY_FIELDS}, {FIELDS}. |
| Force Loading Files | Adds FORCE=true to COPY INTO, reloading files even if they haven't changed. |
File format and parsing
| Parameter | What it does |
|---|---|
| File Format | Format name or inline definition, e.g. (type = 'CSV' field_delimiter = ',' skip_header = 1). If empty, Etlworks generates one from the destination format. |
| Replace Invalid Characters | Replaces invalid UTF-8 bytes with the Unicode replacement character instead of failing. |
| Error on Column Mismatch | If off, the load continues even when source and target column counts differ: extra source fields are ignored; missing source fields become NULL. |
| Trim Space | Strips leading and trailing whitespace from string fields. |
| String Used for SQL NULL | Maps specific source strings to SQL NULL on load. Comma-separated, parenthesized, e.g. NULL_IF = ('N', 'NULL', 'NUL', ''). |
Upload performance (local stage only)
| Parameter | What it does |
|---|---|
| Number of Threads for Uploading Files | 1 (no parallelism) to 99 (max). Files under 16 MB are staged in parallel as-is; larger files are split, uploaded concurrently, and reassembled in the stage. Higher thread counts help on large files. |
ELT — SQL before and after COPY INTO
| Parameter | What it does |
|---|---|
| Before COPY INTO SQL | SQL to run on the Snowflake connection before the COPY INTO. |
| Ignore Errors for Before COPY INTO SQL | Lets the flow continue if the before-SQL fails. |
| Before COPY INTO SQL is a Script | Treats the field as a full Snowflake SQL script, not a single statement. |
| After COPY INTO SQL / Ignore Errors / is a Script | Same three options, but run after the COPY INTO. |
Error and file cleanup
| Parameter | What it does |
|---|---|
| Truncate Columns | Adds TRUNCATECOLUMNS = true to COPY INTO, truncating strings to fit target column length. Off by default. |
| Purge File if Success | On by default (when not using user-defined SQL). Deletes staged files after a successful COPY INTO. |
| Purge File(s) if Error | On by default. Deletes staged files on COPY INTO failure. |
| Name(s) of the File(s) to Purge | Files to delete on error. Wildcards supported. Defaults to the same wildcard used in COPY INTO. |
| On Error | ABORT_STATEMENT (default) stops on error. CONTINUE ignores the error and continues. SKIP_FILE skips the failing file and continues with the next. |
Debugging
| Parameter | What it does |
|---|---|
| Log Each Executed SQL Statement | Logs every auto-generated and executed SQL statement, including before-/after-SQL. |
Source-specific parameters
Some parameters only apply to certain source types:
- Database source: see Extract, Transform, and Load (ETL) data for source-database options.
-
File or web-service source:
- Ignore Transformations: copies files directly to the stage without applying transformations. Much faster, but you lose any transformation steps.
- Delete Loaded Source Files: deletes the source file after a successful load.
- Delete Source Files on Error: deletes the source file if the load fails.
- Well-known API source: see the well-known API options.
Step 8. Configure MERGE (UPSERT) (optional)
To upsert instead of insert:
- Set Action to MERGE.
- Provide Lookup Fields — a comma-separated list of columns that uniquely identify a record. Or enable Predict Lookup Fields for auto-detection (less reliable).
For wildcard sources (Step 9 below) that target multiple tables, specify per-table lookup fields with fully qualified names, separated by semicolons:
test1.inventory=inventory_id,database_name; test1.payment=payment_id,database_name; test1.rental=rental_id,database_name;
Step 9. Handle source / destination schema differences
The source and the Snowflake target often have a different column count or order. Snowflake's COPY INTO fails when columns don't line up. Etlworks gives you several ways to deal with this:
| Option | What it does |
|---|---|
| Reorder CSV Columns During Extract | Reorders columns in the extracted file to match the target. Ignored if the target table doesn't exist. |
| Reorder CSV Columns During Load (recommended) | Reorders columns inside the COPY INTO step. Works even when the file has different columns or order than the target. Prefer this over the during-extract option. |
| Ignore Fields in Source Not in Target | Drops source columns the target doesn't have. Mutually exclusive with the next two. |
| Alter Target Table If Source Has Extra Columns | Adds the extra columns to the Snowflake table. Mutually exclusive with the previous and next. |
| Recreate Target Table If Source Has Extra Columns | Drops and recreates the target table. Destructive. Mutually exclusive with the previous two. |
| Insert NULL Into Target Fields Not in Source | Inserts NULL for any target column the source doesn't have. |
| Create Table SQL | Custom SQL or JavaScript template for creating the target table. See overriding the create-table SQL. |
Notifications on schema mismatch
If the source and target columns don't match, the flow logs an exception. You can wire that exception into an email notification — see sending notifications about failed transformations.
Step 10. Configure mapping (optional)
If you need explicit field-to-field mapping, set it up in Mapping. Mapping is not required; auto-mapping by field name works for most flows.
Snowflake has restrictions on field names — if the source contains characters Snowflake doesn't accept, COPY INTO fails. Map or rename those fields before loading.
Incremental change replication (high watermark)
Like any other ETL flow, this flow supports change replication using a high watermark (HWM). Only records that changed since the last run are loaded.
To enable it:
- Set the High Watermark Field — the column that tracks changes.
- Enable Change Replication on the transformation.
- Optional: set a calculated HWM value for clearer tracking.
Load multiple tables with a wildcard
You can load many database objects in a single transformation by using a wildcard in FROM.
FROM
- Create a single source-to-destination transformation.
- Set FROM to a fully qualified source object name with wildcards, e.g. public.*.
By default every table and view matching the pattern is included.
TO
Set TO to a Snowflake schema with a wildcard, e.g. SNOWFLAKE_DW.SCHEMA.* (or UTIL_DB.PUBLIC.*).
Include / exclude objects
- Exclude Objects: comma-separated list of objects to skip. Use all views to skip every view; all tables to skip every table.
- Include Objects: comma-separated list of objects to keep (everything else is skipped).
Source query, before/after SQL
Use the {TABLE} token in Source Query, Before COPY INTO SQL, and After COPY INTO SQL to reference each table dynamically.
MERGE with wildcards
When MERGE is the action:
- Leave Lookup Field empty and enable Predict Lookup Fields, or
- Specify per-table lookup fields as semicolon-separated fully_qualified_table=field_list pairs:
test1.inventory=inventory_id,database_name; test1.payment=payment_id,database_name; test1.rental=rental_id,database_name;
HWM with wildcards
For incremental replication with a calculated HWM field, use the {TABLE} token in the High Watermark Field Value.
Troubleshooting
See Common issues when loading data into cloud data warehouses for common errors and fixes.