Bulk load files in Azure Storage into Synapse Analytics and Microsoft Fabric Warehouse – Etlworks Support

If your files are already in a stage that Synapse Analytics or Microsoft Fabric Warehouse can read, and you don't need any transformation, use this flow. Etlworks reads the file list, optionally creates the destination table, generates the COPY INTO command, and (optionally) MERGEs the data.

When to use this flow

You want to load files (CSV, Parquet) that already exist in an Azure Blob container — no transformation needed, no over-the-network transfer.
You want to load files produced by a CDC flow into Synapse Analytics or Microsoft Fabric Warehouse.

For a flow that also extracts and transforms data on the way in, use Extract, transform, and load data in Synapse Analytics or Microsoft Fabric Warehouse instead.

How does this flow work?

Reads the names of all files matching the wildcard in the stage. Can recurse into subfolders.
Derives destination table names from the source file names.
Spawns one thread per destination table (up to a configurable limit) and loads them in parallel.
For each table: loads the staged files into a temporary table via COPY INTO, then merges into the actual destination table.

Alternatively, the flow can skip the temp-table step and load directly into the actual destination table — MERGE and CDC MERGE aren't available in direct-load mode.

Features

Supports COPY INTO (INSERT), MERGE (UPSERT), and CDC MERGE. CDC MERGE applies INSERT / UPDATE / DELETE events in the exact order they were captured from the source database.

Feature	What it does
Monitor source schema changes	Auto-CREATE and ALTER destination tables as source schemas evolve.
Load directly into the destination table	Skip the temporary-table step. Faster, but MERGE / CDC MERGE aren't available.
Delete loaded source files	Remove successfully loaded source files after the load.
Load data in parallel	Multiple destination tables load in parallel threads.
Process files in subfolders	Walk nested subdirectories during the file scan.

What do I need before I start?

For Synapse Analytics:

Synapse is available from your Etlworks instance.
The Synapse user has ADMINISTER DATABASE BULK OPERATIONS and INSERT.
An Azure Storage account exists and the user has read/write on the target container.

For Microsoft Fabric Warehouse:

The Fabric Warehouse exists and is reachable.
An Azure Data Lake Storage Gen2 account exists, with the user having read/write on the container, linked to the Power BI / Fabric workspace (see Get started with Synapse / Fabric, steps 8–11).

Step-by-step setup

Step 1. Create the stage connection

Create an Azure Storage connection — Synapse / Fabric loads from Azure Blob.

Important: disable gzip compression on the Azure Storage connection if you're loading Parquet files.

Step 2. Create the Synapse Analytics or Microsoft Fabric Warehouse connection

See the connector reference for full setup.

Step 3. Create the format

For most cases a CSV format is fine. Synapse / Fabric also loads Parquet; create a Parquet format with Serialize dates as = Milliseconds from epoch and optionally a Compression Codec.

Step 4. Create the flow

Open Flows, click Add flow, type bulk load synapse in the gallery search, and select the bulk-load flow type.

Step 5. Configure the load transformation

Add a source-to-destination transformation. Source connection = the stage from Step 1; destination connection = the Synapse Analytics or Microsoft Fabric Warehouse connection from Step 2.

FROM: the wildcard pattern for the staged files (e.g. folder/*.csv or folder/**/*.csv for nested folders).

TO: the destination table pattern. Use a wildcard like schema.* to let Etlworks derive the table name from each file.

Step 6. Set parameters

Source files and destination tables

Parameter	What it does
Calculate Synapse Analytics or Microsoft Fabric Warehouse table name	How the destination table name is derived from each source filename (none / strip extension / strip extension and timestamp / custom JavaScript).
Include files in subfolders	Recurse into nested directories during the file scan.
Maximum number of files to process	Hard cap on how many files are processed per flow run.
Maximum number of parallel threads	How many destination tables load concurrently. Default 5; max 99.

Load and MERGE

Parameter	What it does
Action	COPY INTO inserts new rows; MERGE upserts on lookup fields; CDC MERGE applies CDC change events.
Lookup Fields	Comma-separated list of columns that uniquely identify a record. Required for MERGE and CDC MERGE. Can also be specified per-table as semicolon-separated fully_qualified_table=field_list pairs.
Predict Lookup Fields	Auto-detect the unique-key columns when Lookup Fields is empty.
Load directly into destination table	Skips the temp-table step. MERGE / CDC MERGE not available in this mode.

ELT — SQL before and after the load

Parameter	What it does
Before COPY INTO SQL / Ignore errors / is a script	SQL to run on the Synapse Analytics or Microsoft Fabric Warehouse connection before the load.
After COPY INTO SQL / Ignore errors / is a script	Same, but after the load.

Handling source schema changes

Option	What it does
Alter target table if source has extra columns	Adds missing columns to the destination table.
Recreate target table if source has extra columns	Drops and recreates the destination. Destructive.
Insert NULL into target fields not in source	Inserts NULL for any target column the source file doesn't have.
Ignore fields in source not in target	Drops source columns the destination doesn't have.

Debug and error recovery

Parameter	What it does
Log each executed SQL statement	Logs every generated and executed SQL statement.
On Error	Behavior on a per-file or per-statement error: stop, continue, or skip the file (warehouse-specific options).
Purge File(s) if Error	On by default. Deletes staged files on load failure.

Step 7. Optionally configure mapping

If you need explicit field-to-field mapping, set it up in Mapping. Auto-mapping by field name works for most flows.

Step 8. Optionally add more transformations

You can add multiple source-to-destination transformations to a single flow to load different file patterns into different table sets.

Handling processed files

Three options for what to do with source files once they've loaded successfully.

Delete processed files

Enable Delete loaded source files. The flow removes each source file as soon as its load completes successfully. Useful for one-shot loads.

Skip already-processed files

Enable Do not process files that have been already processed. Etlworks tracks loaded filenames internally and skips files that have already been processed. Works well for incremental ingest where source files are appended over time.

Move processed files

Set Move files after load and pick a target folder. After a successful load, the source files are moved (rather than deleted), keeping an archive.

Troubleshooting

See Common issues when loading data into cloud data warehouses for common errors and fixes.

Articles in this section