If your files are already in a stage that Synapse Analytics or Microsoft Fabric Warehouse can read, and you don't need any transformation, use this flow. Etlworks reads the file list, optionally creates the destination table, generates the COPY INTO command, and (optionally) MERGEs the data.
When to use this flow
- You want to load files (CSV, Parquet) that already exist in an Azure Blob container — no transformation needed, no over-the-network transfer.
- You want to load files produced by a CDC flow into Synapse Analytics or Microsoft Fabric Warehouse.
For a flow that also extracts and transforms data on the way in, use Extract, transform, and load data in Synapse Analytics or Microsoft Fabric Warehouse instead.
How does this flow work?
- Reads the names of all files matching the wildcard in the stage. Can recurse into subfolders.
- Derives destination table names from the source file names.
- Spawns one thread per destination table (up to a configurable limit) and loads them in parallel.
- For each table: loads the staged files into a temporary table via COPY INTO, then merges into the actual destination table.
Alternatively, the flow can skip the temp-table step and load directly into the actual destination table — MERGE and CDC MERGE aren't available in direct-load mode.
Features
Supports COPY INTO (INSERT), MERGE (UPSERT), and CDC MERGE. CDC MERGE applies INSERT / UPDATE / DELETE events in the exact order they were captured from the source database.
| Feature | What it does |
|---|---|
| Monitor source schema changes | Auto-CREATE and ALTER destination tables as source schemas evolve. |
| Load directly into the destination table | Skip the temporary-table step. Faster, but MERGE / CDC MERGE aren't available. |
| Delete loaded source files | Remove successfully loaded source files after the load. |
| Load data in parallel | Multiple destination tables load in parallel threads. |
| Process files in subfolders | Walk nested subdirectories during the file scan. |
What do I need before I start?
For Synapse Analytics:
- Synapse is available from your Etlworks instance.
- The Synapse user has ADMINISTER DATABASE BULK OPERATIONS and INSERT.
- An Azure Storage account exists and the user has read/write on the target container.
For Microsoft Fabric Warehouse:
- The Fabric Warehouse exists and is reachable.
- An Azure Data Lake Storage Gen2 account exists, with the user having read/write on the container, linked to the Power BI / Fabric workspace (see Get started with Synapse / Fabric, steps 8–11).
Step-by-step setup
Step 1. Create the stage connection
Create an Azure Storage connection — Synapse / Fabric loads from Azure Blob.
Important: disable gzip compression on the Azure Storage connection if you're loading Parquet files.
Step 2. Create the Synapse Analytics or Microsoft Fabric Warehouse connection
See the connector reference for full setup.
Step 3. Create the format
For most cases a CSV format is fine. Synapse / Fabric also loads Parquet; create a Parquet format with Serialize dates as = Milliseconds from epoch and optionally a Compression Codec.
Step 4. Create the flow
Open Flows, click Add flow, type bulk load synapse in the gallery search, and select the bulk-load flow type.
Step 5. Configure the load transformation
Add a source-to-destination transformation. Source connection = the stage from Step 1; destination connection = the Synapse Analytics or Microsoft Fabric Warehouse connection from Step 2.
FROM: the wildcard pattern for the staged files (e.g. folder/*.csv or folder/**/*.csv for nested folders).
TO: the destination table pattern. Use a wildcard like schema.* to let Etlworks derive the table name from each file.
Step 6. Set parameters
Source files and destination tables
| Parameter | What it does |
|---|---|
| Calculate Synapse Analytics or Microsoft Fabric Warehouse table name | How the destination table name is derived from each source filename (none / strip extension / strip extension and timestamp / custom JavaScript). |
| Include files in subfolders | Recurse into nested directories during the file scan. |
| Maximum number of files to process | Hard cap on how many files are processed per flow run. |
| Maximum number of parallel threads | How many destination tables load concurrently. Default 5; max 99. |
Load and MERGE
| Parameter | What it does |
|---|---|
| Action | COPY INTO inserts new rows; MERGE upserts on lookup fields; CDC MERGE applies CDC change events. |
| Lookup Fields | Comma-separated list of columns that uniquely identify a record. Required for MERGE and CDC MERGE. Can also be specified per-table as semicolon-separated fully_qualified_table=field_list pairs. |
| Predict Lookup Fields | Auto-detect the unique-key columns when Lookup Fields is empty. |
| Load directly into destination table | Skips the temp-table step. MERGE / CDC MERGE not available in this mode. |
ELT — SQL before and after the load
| Parameter | What it does |
|---|---|
| Before COPY INTO SQL / Ignore errors / is a script | SQL to run on the Synapse Analytics or Microsoft Fabric Warehouse connection before the load. |
| After COPY INTO SQL / Ignore errors / is a script | Same, but after the load. |
Handling source schema changes
| Option | What it does |
|---|---|
| Alter target table if source has extra columns | Adds missing columns to the destination table. |
| Recreate target table if source has extra columns | Drops and recreates the destination. Destructive. |
| Insert NULL into target fields not in source | Inserts NULL for any target column the source file doesn't have. |
| Ignore fields in source not in target | Drops source columns the destination doesn't have. |
Debug and error recovery
| Parameter | What it does |
|---|---|
| Log each executed SQL statement | Logs every generated and executed SQL statement. |
| On Error | Behavior on a per-file or per-statement error: stop, continue, or skip the file (warehouse-specific options). |
| Purge File(s) if Error | On by default. Deletes staged files on load failure. |
Step 7. Optionally configure mapping
If you need explicit field-to-field mapping, set it up in Mapping. Auto-mapping by field name works for most flows.
Step 8. Optionally add more transformations
You can add multiple source-to-destination transformations to a single flow to load different file patterns into different table sets.
Handling processed files
Three options for what to do with source files once they've loaded successfully.
Delete processed files
Enable Delete loaded source files. The flow removes each source file as soon as its load completes successfully. Useful for one-shot loads.
Skip already-processed files
Enable Do not process files that have been already processed. Etlworks tracks loaded filenames internally and skips files that have already been processed. Works well for incremental ingest where source files are appended over time.
Move processed files
Set Move files after load and pick a target folder. After a successful load, the source files are moved (rather than deleted), keeping an archive.
Troubleshooting
See Common issues when loading data into cloud data warehouses for common errors and fixes.