Use this flow to extract data from any supported source, transform it, and bulk-load it into any database that supports loading from a file. You provide the database-specific COPY / BULK INSERT SQL; Etlworks generates the file, stages it, runs the SQL, and (optionally) runs a MERGE.
When to use this flow type
Etlworks has dedicated flow types for major cloud data warehouses — use those when they apply, because they auto-generate the load SQL.
| Target | Use this dedicated flow instead |
|---|---|
| Snowflake | Extract, transform, and load data in Snowflake |
| Amazon Redshift | Extract, transform, and load data in Amazon Redshift |
| Google BigQuery | Extract, transform, and load data in Google BigQuery |
| Azure Synapse Analytics / Microsoft Fabric Warehouse | Extract, transform, and load data in Synapse / Fabric |
| Vertica | Extract, transform, and load data in Vertica |
| Greenplum | Extract, transform, and load data in Greenplum |
Use the generic bulk-load flow when your destination is something else — SQL Server (BULK INSERT), PostgreSQL (COPY), MySQL (LOAD DATA INFILE), Oracle (SQL\*Loader), and so on — or when the destination is a database supported by a dedicated flow but you need the lower-level control of writing your own COPY SQL.
How does this flow work?
- Extracts data from the source.
- Writes the data to CSV (or another supported format) in a staging area: server storage, S3, Azure Blob, GCS, FTP, or SFTP.
- Optionally compresses files with gzip.
- Creates the destination table from the source metadata if it doesn't exist.
- Runs the user-defined COPY SQL to bulk-load the file into the destination database.
- Optionally runs a MERGE statement (auto-generated or user-defined).
- Cleans up staged files.
What do I need before I start?
The destination database supports bulk-loading from a file via SQL (BULK INSERT, COPY, LOAD DATA INFILE, etc.) and can read the staging area you choose.
Step-by-step setup
Step 1. Create connections
You need three connections:
- A source connection.
- A stage connection: server storage, S3, Azure Blob, GCS, SFTP, or FTP.
- A destination database connection. Enable Auto Commit.
Which stage connection to use
| Stage type | Use when |
|---|---|
| Server storage | The destination database is on the same machine or network as Etlworks and can directly read files produced by the flow. |
| Amazon S3 | The destination is hosted on AWS (RDS, Aurora) and supports loading from the mapped S3 bucket. |
| Azure Blob | The destination is hosted on Azure and supports loading from the mapped Azure Blob container. |
| Google Cloud Storage | The destination is hosted on Google Cloud and supports loading from the mapped GCS bucket. |
| FTP or SFTP | The destination can't access flow-produced files directly, but can read a folder that's also exposed as an FTP / SFTP directory. |
For cloud-storage stages, set Archive file before copying to GZip. The destination database must support loading gzipped files.
Step 2. Create a data exchange format
Most databases load from CSV. Some also support JSON, XML, Avro, or Parquet. Create one of these and use it as the destination format.
Important: For CSV, set Maximum number of rows in file so the flow splits large datasets into chunks. Smaller files load much faster, especially with parallel threads.
Step 3. Create the flow
Open Flows, click +, type bulk in the gallery search, and pick the flow type that matches your source.
Step 4. Set the destination connection on the flow
Open the Connections tab and pick the destination database connection from Step 1.
Step 5. Set source (FROM) and destination (TO)
FROM (varies by flow type):
- Database: table or view name. Wildcards supported, e.g. public.*.
- Well-known API: endpoint or file name.
- Web service: endpoint name.
- File: file name or wildcard pattern.
- Queue: queue name.
TO: the fully qualified destination table name. If FROM uses a wildcard, use a matching wildcard in TO.
The stage connection is the destination connection on the transformation. Etlworks treats it as the staging area.
Step 6. Configure the COPY SQL
Click MAPPING on the transformation row, go to the Parameters tab, and edit the Copy SQL field.
The SQL is database-specific. Available tokens:
- {TABLE} — destination table.
- {TEMP_TABLE} — staging table used for MERGE.
- {FILE_TO_LOAD} — file name without path or extension.
- {PATH} — path to the file without filename (e.g. {app.data}/test/).
- {EXT} — file extension without the leading dot (e.g. csv).
Example for on-premise SQL Server:
BULK INSERT {TABLE}
FROM '/data/{FILE_TO_LOAD}.dat'
WITH (
FIELDTERMINATOR = ',',
FIRSTROW = 2,
ROWTERMINATOR = '\n'
)
Notes on the example:
- {TABLE}: fully qualified destination table.
- /data: location of the staging area on the SQL Server host.
- {FILE_TO_LOAD}: file name without directory or extension.
- .dat: extension Etlworks uses by default for CSV files.
In Exception when table not found, enter the substring of the error message your database returns when the destination table doesn't exist. Etlworks uses it to detect when to auto-create the table from source metadata. The text varies per database; check your database documentation. See the examples below.
Step 7. Set parameters (optional)
COPY SQL parameters and Action
| Parameter | What it does |
|---|---|
| Action | COPY inserts rows from the file into the destination; MERGE upserts on lookup fields. |
| Lookup Fields | Comma-separated list of columns that uniquely identify a record. Required for MERGE. |
| Predict Lookup Fields | Auto-detect the unique-key columns when Lookup Fields is empty. |
| COPY SQL | The user-defined SQL from Step 6. |
| MERGE SQL | User-defined MERGE SQL. Tokens: {TABLE}, {TEMP_TABLE}, {KEY_FIELDS}, {FIELDS}. |
File cleanup and ELT SQL
| Parameter | What it does |
|---|---|
| Purge File if Success | Deletes the staged file after a successful COPY. |
| Purge File if Error | On by default. Deletes staged files on COPY failure. |
| Before COPY SQL / Ignore errors | SQL to run on the destination database before COPY, with optional error suppression. |
| After COPY SQL / Ignore errors | Same, but after COPY. |
Source-specific parameters
- Database source: see Extract, Transform, and Load (ETL) data.
-
File or web-service source:
- Ignore Transformations: copies files directly to the stage. Faster; transformations are skipped.
- Delete loaded source files: deletes the source file after a successful load.
- Delete source files on error: deletes the source file if the load fails.
- Well-known API source: see the well-known API options.
Step 8. Configure MERGE (UPSERT) (optional)
- Set Action to MERGE.
- Provide Lookup Fields, or enable Predict Lookup Fields.
For wildcard sources targeting multiple tables, use semicolon-separated per-table lookup fields:
test1.inventory=inventory_id,database_name; test1.payment=payment_id,database_name; test1.rental=rental_id,database_name;
Step 9. Handle source / destination schema differences
| Option | What it does |
|---|---|
| Reorder columns to match target (and update types) | Reorders columns in the data file to match the destination and updates column types to match the target. |
| Ignore Fields in Source Not in Target | Drops source columns the target doesn't have. |
| Alter Target Table If Source Has Extra Columns | Adds the extra columns to the destination table. |
| Recreate Target Table If Source Has Extra Columns | Drops and recreates the target table. Destructive. |
| Insert NULL Into Target Fields Not in Source | Inserts NULL for any target column the source doesn't have. |
| Create Table SQL | Custom SQL or JavaScript template for creating the target table. See overriding the create-table SQL. |
Notifications on schema mismatch
If columns don't match, the flow logs an exception. See sending notifications about failed transformations.
Step 10. Configure mapping (optional)
If you need explicit field-to-field mapping, set it up in Mapping. Auto-mapping by field name works for most flows.
Incremental change replication (high watermark)
Change replication using HWM is supported. Set the High Watermark Field, enable Change Replication, and optionally configure a calculated HWM value.
Load multiple tables with a wildcard
FROM
Create a single source-to-destination transformation and set FROM to a wildcard pattern, e.g. public.*.
TO
Set TO to a destination schema with a wildcard, e.g. schema.*.
Include / exclude objects
- Exclude Objects: comma-separated list of objects to skip.
- Include Objects: comma-separated list of objects to keep.
Source query, before/after SQL
Use the {TABLE} token in Source Query, Before COPY SQL, and After COPY SQL.
MERGE with wildcards
Leave Lookup Fields empty and enable Predict Lookup Fields, or specify per-table lookup fields as semicolon-separated pairs.
HWM with wildcards
Use the {TABLE} token in the High Watermark Field Value.
Examples
Azure SQL BULK INSERT from Azure Blob
This example loads a CSV file in Azure Blob into Azure SQL using a SAS key. Azure Blob is configured as an external data source via a database-scoped credential.
Step 1. Create a master encryption key (optional — not required if the database-scoped credential isn't required, i.e. the blob allows anonymous access):
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'YourStrongPassword1';
Step 2. Create a database-scoped credential (optional — not required for anonymous-access blobs). Make sure the SAS token has no leading ?, includes at least read permissions on the object (srt=o&sp=r), and isn't expired (all dates are UTC):
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential WITH IDENTITY = 'SHARED ACCESS SIGNATURE', SECRET = 'srt=sco&sp=rwac&se=2027-02-01T00:55:34Z&st=2026-12-29T16:55:34Z***';
Step 3. Create the external data source (credential is optional for anonymous access):
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH (TYPE = BLOB_STORAGE,
LOCATION = 'https://example.blob.core.windows.net/invoices',
CREDENTIAL = MyAzureBlobStorageCredential);
Step 4. Create a bulk-load flow with an Azure Blob destination connection pointed at the same container.
Step 5. Enter the COPY SQL, referencing the external data source from Step 3:
BULK INSERT {TABLE}
FROM '{FILE_TO_LOAD}.csv'
WITH (
DATA_SOURCE = 'MyAzureBlobStorage',
FIELDTERMINATOR = ',',
CODEPAGE = '65001',
FORMAT = 'CSV',
ROWTERMINATOR = '0x0a',
FIRSTROW = 2,
FIELDQUOTE = '"',
MAXERRORS = 10
)
Load large datasets
For billions of rows, three techniques help:
Split CSV files into chunks
Set Maximum number of rows in file on the CSV format. 100,000 is a good starting value.
Enable parallel load
When CSV chunking is enabled, the transformation can load chunked files in parallel. Enable Use Parallel Threads when processing sources by a wildcard under MAPPING → Parameters → Common Parameters.
Enable HWM change replication
With HWM change replication enabled, only changed records are extracted and loaded.