ETL into databases using bulk load – Etlworks Support

Use this flow to extract data from any supported source, transform it, and bulk-load it into any database that supports loading from a file. You provide the database-specific COPY / BULK INSERT SQL; Etlworks generates the file, stages it, runs the SQL, and (optionally) runs a MERGE.

When to use this flow type

Etlworks has dedicated flow types for major cloud data warehouses — use those when they apply, because they auto-generate the load SQL.

Target	Use this dedicated flow instead
Snowflake	Extract, transform, and load data in Snowflake
Amazon Redshift	Extract, transform, and load data in Amazon Redshift
Google BigQuery	Extract, transform, and load data in Google BigQuery
Azure Synapse Analytics / Microsoft Fabric Warehouse	Extract, transform, and load data in Synapse / Fabric
Vertica	Extract, transform, and load data in Vertica
Greenplum	Extract, transform, and load data in Greenplum

Use the generic bulk-load flow when your destination is something else — SQL Server (BULK INSERT), PostgreSQL (COPY), MySQL (LOAD DATA INFILE), Oracle (SQL\*Loader), and so on — or when the destination is a database supported by a dedicated flow but you need the lower-level control of writing your own COPY SQL.

How does this flow work?

Extracts data from the source.
Writes the data to CSV (or another supported format) in a staging area: server storage, S3, Azure Blob, GCS, FTP, or SFTP.
Optionally compresses files with gzip.
Creates the destination table from the source metadata if it doesn't exist.
Runs the user-defined COPY SQL to bulk-load the file into the destination database.
Optionally runs a MERGE statement (auto-generated or user-defined).
Cleans up staged files.

What do I need before I start?

The destination database supports bulk-loading from a file via SQL (BULK INSERT, COPY, LOAD DATA INFILE, etc.) and can read the staging area you choose.

Step-by-step setup

Step 1. Create connections

You need three connections:

A source connection.
A stage connection: server storage, S3, Azure Blob, GCS, SFTP, or FTP.
A destination database connection. Enable Auto Commit.

Which stage connection to use

Stage type	Use when
Server storage	The destination database is on the same machine or network as Etlworks and can directly read files produced by the flow.
Amazon S3	The destination is hosted on AWS (RDS, Aurora) and supports loading from the mapped S3 bucket.
Azure Blob	The destination is hosted on Azure and supports loading from the mapped Azure Blob container.
Google Cloud Storage	The destination is hosted on Google Cloud and supports loading from the mapped GCS bucket.
FTP or SFTP	The destination can't access flow-produced files directly, but can read a folder that's also exposed as an FTP / SFTP directory.

For cloud-storage stages, set Archive file before copying to GZip. The destination database must support loading gzipped files.

Step 2. Create a data exchange format

Most databases load from CSV. Some also support JSON, XML, Avro, or Parquet. Create one of these and use it as the destination format.

Important: For CSV, set Maximum number of rows in file so the flow splits large datasets into chunks. Smaller files load much faster, especially with parallel threads.

CSV format settings

Step 3. Create the flow

Open Flows, click +, type bulk in the gallery search, and pick the flow type that matches your source.

Flow gallery search

Step 4. Set the destination connection on the flow

Open the Connections tab and pick the destination database connection from Step 1.

Flow Connections tab

Step 5. Set source (FROM) and destination (TO)

FROM (varies by flow type):

Database: table or view name. Wildcards supported, e.g. public.*.
Well-known API: endpoint or file name.
Web service: endpoint name.
File: file name or wildcard pattern.
Queue: queue name.

TO: the fully qualified destination table name. If FROM uses a wildcard, use a matching wildcard in TO.

The stage connection is the destination connection on the transformation. Etlworks treats it as the staging area.

FROM and TO

Step 6. Configure the COPY SQL

Click MAPPING on the transformation row, go to the Parameters tab, and edit the Copy SQL field.

The SQL is database-specific. Available tokens:

{TABLE} — destination table.
{TEMP_TABLE} — staging table used for MERGE.
{FILE_TO_LOAD} — file name without path or extension.
{PATH} — path to the file without filename (e.g. {app.data}/test/).
{EXT} — file extension without the leading dot (e.g. csv).

Example for on-premise SQL Server:

BULK INSERT {TABLE}
FROM '/data/{FILE_TO_LOAD}.dat'
WITH (
    FIELDTERMINATOR = ',',
    FIRSTROW = 2,
    ROWTERMINATOR = '\n'
)

Notes on the example:

{TABLE}: fully qualified destination table.
/data: location of the staging area on the SQL Server host.
{FILE_TO_LOAD}: file name without directory or extension.
.dat: extension Etlworks uses by default for CSV files.

In Exception when table not found, enter the substring of the error message your database returns when the destination table doesn't exist. Etlworks uses it to detect when to auto-create the table from source metadata. The text varies per database; check your database documentation. See the examples below.

COPY SQL settings

Step 7. Set parameters (optional)

COPY SQL parameters and Action

Parameter	What it does
Action	COPY inserts rows from the file into the destination; MERGE upserts on lookup fields.
Lookup Fields	Comma-separated list of columns that uniquely identify a record. Required for MERGE.
Predict Lookup Fields	Auto-detect the unique-key columns when Lookup Fields is empty.
COPY SQL	The user-defined SQL from Step 6.
MERGE SQL	User-defined MERGE SQL. Tokens: {TABLE}, {TEMP_TABLE}, {KEY_FIELDS}, {FIELDS}.

File cleanup and ELT SQL

Parameter	What it does
Purge File if Success	Deletes the staged file after a successful COPY.
Purge File if Error	On by default. Deletes staged files on COPY failure.
Before COPY SQL / Ignore errors	SQL to run on the destination database before COPY, with optional error suppression.
After COPY SQL / Ignore errors	Same, but after COPY.

Source-specific parameters

Database source: see Extract, Transform, and Load (ETL) data.
File or web-service source:
- Ignore Transformations: copies files directly to the stage. Faster; transformations are skipped.
- Delete loaded source files: deletes the source file after a successful load.
- Delete source files on error: deletes the source file if the load fails.
Well-known API source: see the well-known API options.

Step 8. Configure MERGE (UPSERT) (optional)

Set Action to MERGE.
Provide Lookup Fields, or enable Predict Lookup Fields.

For wildcard sources targeting multiple tables, use semicolon-separated per-table lookup fields:

test1.inventory=inventory_id,database_name;
test1.payment=payment_id,database_name;
test1.rental=rental_id,database_name;

MERGE configuration

Step 9. Handle source / destination schema differences

Option	What it does
Reorder columns to match target (and update types)	Reorders columns in the data file to match the destination and updates column types to match the target.
Ignore Fields in Source Not in Target	Drops source columns the target doesn't have.
Alter Target Table If Source Has Extra Columns	Adds the extra columns to the destination table.
Recreate Target Table If Source Has Extra Columns	Drops and recreates the target table. Destructive.
Insert NULL Into Target Fields Not in Source	Inserts NULL for any target column the source doesn't have.
Create Table SQL	Custom SQL or JavaScript template for creating the target table. See overriding the create-table SQL.

Notifications on schema mismatch

If columns don't match, the flow logs an exception. See sending notifications about failed transformations.

Step 10. Configure mapping (optional)

If you need explicit field-to-field mapping, set it up in Mapping. Auto-mapping by field name works for most flows.

Incremental change replication (high watermark)

Change replication using HWM is supported. Set the High Watermark Field, enable Change Replication, and optionally configure a calculated HWM value.

HWM configuration

Load multiple tables with a wildcard

FROM

Create a single source-to-destination transformation and set FROM to a wildcard pattern, e.g. public.*.

Wildcard FROM

TO

Set TO to a destination schema with a wildcard, e.g. schema.*.

Wildcard TO

Include / exclude objects

Exclude Objects: comma-separated list of objects to skip.
Include Objects: comma-separated list of objects to keep.

Source query, before/after SQL

Use the {TABLE} token in Source Query, Before COPY SQL, and After COPY SQL.

MERGE with wildcards

Leave Lookup Fields empty and enable Predict Lookup Fields, or specify per-table lookup fields as semicolon-separated pairs.

HWM with wildcards

Use the {TABLE} token in the High Watermark Field Value.

Examples

Azure SQL BULK INSERT from Azure Blob

This example loads a CSV file in Azure Blob into Azure SQL using a SAS key. Azure Blob is configured as an external data source via a database-scoped credential.

Step 1. Create a master encryption key (optional — not required if the database-scoped credential isn't required, i.e. the blob allows anonymous access):

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'YourStrongPassword1';

Step 2. Create a database-scoped credential (optional — not required for anonymous-access blobs). Make sure the SAS token has no leading ?, includes at least read permissions on the object (srt=o&sp=r), and isn't expired (all dates are UTC):

CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'srt=sco&sp=rwac&se=2027-02-01T00:55:34Z&st=2026-12-29T16:55:34Z***';

Step 3. Create the external data source (credential is optional for anonymous access):

CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH (TYPE = BLOB_STORAGE,
      LOCATION = 'https://example.blob.core.windows.net/invoices',
      CREDENTIAL = MyAzureBlobStorageCredential);

Step 4. Create a bulk-load flow with an Azure Blob destination connection pointed at the same container.

Step 5. Enter the COPY SQL, referencing the external data source from Step 3:

BULK INSERT {TABLE}
FROM '{FILE_TO_LOAD}.csv'
WITH (
    DATA_SOURCE = 'MyAzureBlobStorage',
    FIELDTERMINATOR = ',',
    CODEPAGE = '65001',
    FORMAT = 'CSV',
    ROWTERMINATOR = '0x0a',
    FIRSTROW = 2,
    FIELDQUOTE = '"',
    MAXERRORS = 10
)

Azure SQL BULK INSERT settings

Load large datasets

For billions of rows, three techniques help:

Split CSV files into chunks

Set Maximum number of rows in file on the CSV format. 100,000 is a good starting value.

CSV chunking

Enable parallel load

When CSV chunking is enabled, the transformation can load chunked files in parallel. Enable Use Parallel Threads when processing sources by a wildcard under MAPPING → Parameters → Common Parameters.

Parallel threads

Enable HWM change replication

With HWM change replication enabled, only changed records are extracted and loaded.

Published in 1 second.url
86 Bytes Download

Articles in this section