Bulk load files into the database without transformation – Etlworks Support

If your files already exist in cloud or local storage, and you don't need any transformation, use this flow. Etlworks reads the file list, optionally creates destination tables, runs your user-defined bulk-load SQL (BULK INSERT, COPY, LOAD DATA INFILE, etc.), and optionally MERGEs the data.

When to use this flow

Etlworks has dedicated bulk-load flows for the major cloud data warehouses that auto-generate the load SQL. Use those first when they apply:

Target	Use this dedicated flow instead
Snowflake	Bulk load files into Snowflake
Amazon Redshift	Bulk load files in S3 into Redshift
Google BigQuery	Bulk load files in Google Cloud Storage into BigQuery
Azure Synapse Analytics / Microsoft Fabric Warehouse	Bulk load files in Azure Storage into Synapse / Fabric
Vertica	Bulk load files into Vertica
Greenplum	Bulk load files in server storage into Greenplum
Oracle (SQL*Loader)	Bulk load files into Oracle without transformation using SQL*Loader

Use the generic bulk-load flow when your destination is something else — SQL Server (BULK INSERT), PostgreSQL (COPY), MySQL (LOAD DATA INFILE), and so on. You'll write the load SQL yourself.

If you also need to extract from a non-file source and transform on the way in, use ETL into databases using bulk load instead.

How does this flow work?

Reads the names of all files matching the wildcard in the stage. Can recurse into subfolders.
Derives destination table names from the source file names.
Spawns one thread per destination table (up to a configurable limit) and loads them in parallel.
Runs your user-defined load SQL to bulk-load each file into a temporary or staging table.
If MERGE is configured, merges the staged data into the destination table.

What do I need before I start?

The destination database must support bulk-loading from a file via SQL. Common reference docs:

Step-by-step setup

Step 1. Create the stage connection

Use whichever stage the destination database can read:

For cloud-storage stages, set Archive file before copying to GZip if the destination supports gzipped loads.

Step 2. Create the destination database connection

See the database connector reference. Enable Auto Commit.

Step 3. Create a CSV format

Create a CSV format. For large datasets, set Maximum number of rows in file to split into chunks — 100,000 is a good starting value.

Step 4. Create the flow

Open Flows, click Add flow, type bulk in the gallery search, and pick the bulk-load flow type that matches your source / destination combination.

Step 5. Configure the load transformation

Add a source-to-destination transformation. Source = the stage from Step 1; destination = the database connection from Step 2.

FROM: the wildcard pattern for staged files (e.g. folder/*.csv).

TO: the destination table pattern. Use a wildcard like schema.* to derive the table name from each file.

Step 6. Set parameters

Load SQL

Click MAPPING, open Parameters, and fill in Copy SQL. The SQL is database-specific. Available tokens:

{TABLE} — destination table.
{TEMP_TABLE} — staging table used for MERGE.
{FILE_TO_LOAD} — file name without path or extension.
{PATH} — path to the file without filename (e.g. {app.data}/test/).
{EXT} — file extension without the leading dot (e.g. csv).

Example for on-premise SQL Server:

BULK INSERT {TABLE}
FROM '/data/{FILE_TO_LOAD}.dat'
WITH (
    FIELDTERMINATOR = ',',
    FIRSTROW = 2,
    ROWTERMINATOR = '\n'
)

In Exception when table not found, enter the substring of the error message your database returns when the destination table doesn't exist. Etlworks uses it to detect when to auto-create the table from the source metadata. The text varies per database.

Source files and destination tables

Parameter	What it does
Calculate destination table name	How the destination table name is derived from each filename.
Include files in subfolders	Recurse into nested directories.
Maximum number of files to process	Cap on files processed per run.
Maximum number of parallel threads	Default 5; max 99.

Load and MERGE

Parameter	What it does
Action	COPY inserts new rows; MERGE upserts on lookup fields.
Lookup Fields	Columns that uniquely identify a record. Required for MERGE. Per-table form: fully_qualified_table=field_list, separated by semicolons.
Predict Lookup Fields	Auto-detect the unique-key columns when empty.

ELT — SQL before and after load

Parameter	What it does
Before COPY SQL / Ignore errors	SQL to run on the database before the load.
After COPY SQL / Ignore errors	Same, but after.

Handling source schema changes

Option	What it does
Alter target table if source has extra columns	Adds missing columns to the destination.
Recreate target table if source has extra columns	Drops and recreates. Destructive.
Insert NULL into target fields not in source	NULL for any target column the source doesn't have.
Ignore fields in source not in target	Drops source columns the destination doesn't have.

Debug and error recovery

Parameter	What it does
Log each executed SQL statement	Logs every generated and executed SQL statement.
Purge File(s) if Error	On by default. Deletes staged files on failure.

Step 7. Configure mapping (optional)

If you need explicit field-to-field mapping, set it up in Mapping. Auto-mapping by field name works for most flows.

Step 8. Add more transformations (optional)

You can add multiple source-to-destination transformations to a single flow to load different file patterns into different table sets.

Handling processed files

Delete processed files

Enable Delete loaded source files. Source files are removed as soon as each load completes successfully.

Skip already-processed files

Enable Do not process files that have been already processed. Etlworks tracks loaded filenames internally and skips ones it's already processed — useful for incremental ingest.

Move processed files

Set Move files after load and pick a target folder to archive the source files instead of deleting them.

Troubleshooting

See Common issues when loading data into cloud data warehouses for common errors and fixes that also apply to general database bulk loads.

Articles in this section