If your files already exist in cloud or local storage, and you don't need any transformation, use this flow. Etlworks reads the file list, optionally creates destination tables, runs your user-defined bulk-load SQL (BULK INSERT, COPY, LOAD DATA INFILE, etc.), and optionally MERGEs the data.
When to use this flow
Etlworks has dedicated bulk-load flows for the major cloud data warehouses that auto-generate the load SQL. Use those first when they apply:
| Target | Use this dedicated flow instead |
|---|---|
| Snowflake | Bulk load files into Snowflake |
| Amazon Redshift | Bulk load files in S3 into Redshift |
| Google BigQuery | Bulk load files in Google Cloud Storage into BigQuery |
| Azure Synapse Analytics / Microsoft Fabric Warehouse | Bulk load files in Azure Storage into Synapse / Fabric |
| Vertica | Bulk load files into Vertica |
| Greenplum | Bulk load files in server storage into Greenplum |
| Oracle (SQL*Loader) | Bulk load files into Oracle without transformation using SQL*Loader |
Use the generic bulk-load flow when your destination is something else — SQL Server (BULK INSERT), PostgreSQL (COPY), MySQL (LOAD DATA INFILE), and so on. You'll write the load SQL yourself.
If you also need to extract from a non-file source and transform on the way in, use ETL into databases using bulk load instead.
How does this flow work?
- Reads the names of all files matching the wildcard in the stage. Can recurse into subfolders.
- Derives destination table names from the source file names.
- Spawns one thread per destination table (up to a configurable limit) and loads them in parallel.
- Runs your user-defined load SQL to bulk-load each file into a temporary or staging table.
- If MERGE is configured, merges the staged data into the destination table.
What do I need before I start?
The destination database must support bulk-loading from a file via SQL. Common reference docs:
- SQL Server BULK INSERT
- RDS SQL Server BULK INSERT (S3 integration)
- PostgreSQL COPY
- RDS PostgreSQL COPY (S3 integration)
- MySQL LOAD DATA INFILE
- Aurora MySQL LOAD DATA INFILE (S3 integration)
- Oracle inline external tables
Step-by-step setup
Step 1. Create the stage connection
Use whichever stage the destination database can read:
For cloud-storage stages, set Archive file before copying to GZip if the destination supports gzipped loads.
Step 2. Create the destination database connection
See the database connector reference. Enable Auto Commit.
Step 3. Create a CSV format
Create a CSV format. For large datasets, set Maximum number of rows in file to split into chunks — 100,000 is a good starting value.
Step 4. Create the flow
Open Flows, click Add flow, type bulk in the gallery search, and pick the bulk-load flow type that matches your source / destination combination.
Step 5. Configure the load transformation
Add a source-to-destination transformation. Source = the stage from Step 1; destination = the database connection from Step 2.
FROM: the wildcard pattern for staged files (e.g. folder/*.csv).
TO: the destination table pattern. Use a wildcard like schema.* to derive the table name from each file.
Step 6. Set parameters
Load SQL
Click MAPPING, open Parameters, and fill in Copy SQL. The SQL is database-specific. Available tokens:
- {TABLE} — destination table.
- {TEMP_TABLE} — staging table used for MERGE.
- {FILE_TO_LOAD} — file name without path or extension.
- {PATH} — path to the file without filename (e.g. {app.data}/test/).
- {EXT} — file extension without the leading dot (e.g. csv).
Example for on-premise SQL Server:
BULK INSERT {TABLE}
FROM '/data/{FILE_TO_LOAD}.dat'
WITH (
FIELDTERMINATOR = ',',
FIRSTROW = 2,
ROWTERMINATOR = '\n'
)
In Exception when table not found, enter the substring of the error message your database returns when the destination table doesn't exist. Etlworks uses it to detect when to auto-create the table from the source metadata. The text varies per database.
Source files and destination tables
| Parameter | What it does |
|---|---|
| Calculate destination table name | How the destination table name is derived from each filename. |
| Include files in subfolders | Recurse into nested directories. |
| Maximum number of files to process | Cap on files processed per run. |
| Maximum number of parallel threads | Default 5; max 99. |
Load and MERGE
| Parameter | What it does |
|---|---|
| Action | COPY inserts new rows; MERGE upserts on lookup fields. |
| Lookup Fields | Columns that uniquely identify a record. Required for MERGE. Per-table form: fully_qualified_table=field_list, separated by semicolons. |
| Predict Lookup Fields | Auto-detect the unique-key columns when empty. |
ELT — SQL before and after load
| Parameter | What it does |
|---|---|
| Before COPY SQL / Ignore errors | SQL to run on the database before the load. |
| After COPY SQL / Ignore errors | Same, but after. |
Handling source schema changes
| Option | What it does |
|---|---|
| Alter target table if source has extra columns | Adds missing columns to the destination. |
| Recreate target table if source has extra columns | Drops and recreates. Destructive. |
| Insert NULL into target fields not in source | NULL for any target column the source doesn't have. |
| Ignore fields in source not in target | Drops source columns the destination doesn't have. |
Debug and error recovery
| Parameter | What it does |
|---|---|
| Log each executed SQL statement | Logs every generated and executed SQL statement. |
| Purge File(s) if Error | On by default. Deletes staged files on failure. |
Step 7. Configure mapping (optional)
If you need explicit field-to-field mapping, set it up in Mapping. Auto-mapping by field name works for most flows.
Step 8. Add more transformations (optional)
You can add multiple source-to-destination transformations to a single flow to load different file patterns into different table sets.
Handling processed files
Delete processed files
Enable Delete loaded source files. Source files are removed as soon as each load completes successfully.
Skip already-processed files
Enable Do not process files that have been already processed. Etlworks tracks loaded filenames internally and skips ones it's already processed — useful for incremental ingest.
Move processed files
Set Move files after load and pick a target folder to archive the source files instead of deleting them.
Troubleshooting
See Common issues when loading data into cloud data warehouses for common errors and fixes that also apply to general database bulk loads.