Extract, transform, and load data in Azure Synapse Analytics and Microsoft Fabric Warehouse – Etlworks Support

Use this flow to extract data from any supported source, transform it, and load it into Azure Synapse Analytics or Microsoft Fabric Warehouse. Etlworks generates and runs the COPY INTO command for you, optionally followed by a MERGE.

Note: One article, two products. Etlworks connections and flows optimized for Azure Synapse Analytics are fully compatible with Microsoft Fabric Warehouse. A few configuration steps differ — called out below.

Which Synapse / Fabric flow should I use?

Flow	Use when
Any to Synapse Analytics / Fabric Warehouse (this article) Database / File / Queue / Web service / Well-known API → Synapse or Fabric	You need to extract from any source, optionally transform, and load.
Bulk load files in Azure Storage into Synapse / Fabric	The files already exist in Azure Blob. No transformation needed. Auto-generates COPY INTO; supports MERGE.
Stream CDC events into Synapse / Fabric	You need real-time replication from a CDC-enabled source database.
Streaming with message queues	You need real-time ingestion from a message queue that supports streaming.

Azure Synapse Analytics is built on Microsoft SQL Server. Row-by-row inserts are very slow; the flows above use COPY INTO from an Azure Storage stage, which is the supported high-performance load path.

How does this flow work?

Extracts data from the source.
Writes the data to CSV or Parquet files.
Optionally compresses files with gzip (CSV only).
Uploads files into an Azure storage container that serves as the Synapse / Fabric stage.
Creates the destination table from the source metadata if it doesn't exist.
Generates and runs the COPY INTO command.
If MERGE is configured, merges the staged data with the existing target table.
Cleans up staged files.

What do I need for Synapse Analytics?

Synapse Analytics is available from your Etlworks instance.
The Synapse user executing COPY INTO has both ADMINISTER DATABASE BULK OPERATIONS and INSERT permissions.
An Azure Storage account exists and the user has read/write permissions on the container.

What do I need for Microsoft Fabric Warehouse?

The Microsoft Fabric Warehouse exists and is reachable.
An Azure Storage Gen2 account exists, the user has read/write permissions on the container, and the account is linked to PowerBI and the Fabric workspace (see steps 8-11 of the get-started guide).

Step-by-step setup

Step 1. Create connections

A source connection.
An Azure Storage connection for the stage. Set Archive file before copying to GZip. Important: disable gzip compression if you're loading with the Parquet format.
An Azure Synapse Analytics connection or a Microsoft Fabric Warehouse connection.

Step 2. Create a CSV or Parquet format

Synapse / Fabric can load CSV or Parquet.

For CSV:

Enable Convert empty string to null.
Leave Value for null empty.

For Parquet:

Set Serialize dates as to Milliseconds from epoch.
Optionally set Compression Codec to GZIP or SNAPPY.

Step 3. Create the flow

Open Flows, click +, type synapse in the gallery search, and pick the flow type that matches your source.

Flow gallery search

Add the source-to-destination transformation and continue with the steps below.

Transformation

Step 4. Set the Synapse / Fabric connection on the flow

Open the Connections tab and pick the Synapse Analytics or Fabric Warehouse connection from Step 1.

Flow Connections tab

Step 5. Set source (FROM) and destination (TO)

FROM (varies by flow type):

Database: table or view name. Wildcards supported, e.g. public.*.
Well-known API: endpoint or file name.
Web service: endpoint name.
File: file name or wildcard pattern.
Queue: queue name.

TO: the fully qualified Synapse / Fabric table name. If FROM uses a wildcard, use a matching wildcard in TO.

The Azure Storage connection is the destination connection on the transformation. Etlworks treats it as the stage.

FROM and TO

Step 6. Set the data exchange format

Select the CSV or Parquet format you created in Step 2 as the destination format.

Destination format

Step 7. Set parameters (optional)

Click MAPPING on the transformation row, then go to the Parameters tab.

COPY INTO behavior

Parameter	What it does
Column List	Comma-separated list of target columns. Lets you load source fields into specific target columns. Columns can be in any order in the COPY statement, but file order must match.
Line Separator	Defaults to \n on Windows or 0x0a on Linux. Override here if CSV files were generated on a different OS than the load runs on.
Max # of Errors to Ignore	MAXERRORS. Maximum reject rows before COPY is canceled. Default 0.
Error File Directory	Directory (relative to the container) where rejected rows and error files are written. If it doesn't exist, Etlworks creates it; a rejectedrows child directory is created automatically.
Identity Insert	Whether identity values in the source file are used for the identity column. Default OFF: Synapse auto-assigns values; if a column list is passed it must not map an input to the identity column. ON: source values are used; if a column list is passed it must map an input to the identity column. Default values aren't supported for identity columns in the column list.
Date Format	One of mdy, dmy, ymd, ydm, myd, dym. Maps to SQL Server date formats.
Action	COPY INTO inserts new rows; MERGE upserts; CDC MERGE applies CDC change events.
How to MERGE	DELETE/INSERT (default) deletes matching rows then inserts everything from the temp table. Native MERGE executes Synapse native MERGE SQL — requires the table to be created with DISTRIBUTION HASH.
Create tables with DISTRIBUTION = HASH	Creates target tables as CREATE TABLE (...) WITH DISTRIBUTION = HASH(PK). Required for Native MERGE.
Lookup Fields	Comma-separated list of columns that uniquely identify a record. Required when Action is MERGE or CDC MERGE.
COPY INTO SQL	User-defined COPY INTO SQL. Etlworks generates one automatically by default.
MERGE SQL	User-defined MERGE SQL. Tokens: {TABLE}, {TEMP_TABLE}, {KEY_FIELDS}, {FIELDS}.
CDC MERGE SQL	User-defined CDC-MERGE SQL. Tokens: {TABLE}, {TEMP_TABLE}, {KEY_FIELDS}, {FIELDS}, {UPDATE_FIELDS}, {UPDATE_CONDITIONS}.
Purge File if Success	On by default (when not using user-defined SQL). Deletes the staged file after a successful COPY.
Purge File(s) if Error	On by default. Deletes staged files on COPY failure.

ELT — SQL before and after COPY

Parameter	What it does
Before COPY SQL / Ignore errors / is a script	SQL to run on the Synapse / Fabric connection before the COPY. Optional error suppression and full-script mode.
After COPY SQL / Ignore errors / is a script	Same three options, but run after the COPY.

Debugging

Parameter	What it does
Log each executed SQL statement	Logs every auto-generated and executed SQL statement.

Source-specific parameters

Database source: see Extract, Transform, and Load (ETL) data for source-database options.
File or web-service source:
- Ignore Transformations: copies files directly to the Azure stage. Much faster, but you lose any transformation steps.
- Delete loaded source files: deletes the source file after a successful load.
- Delete source files on error: deletes the source file if the load fails.
Well-known API source: see the well-known API options.

Step 8. Configure MERGE (UPSERT) (optional)

Set Action to MERGE.
Provide Lookup Fields, or enable Predict Lookup Fields for auto-detection (less reliable).

For wildcard sources that target multiple tables, specify per-table lookup fields with fully qualified names, separated by semicolons:

test1.inventory=inventory_id,database_name;
test1.payment=payment_id,database_name;
test1.rental=rental_id,database_name;

Synapse MERGE configuration

Step 9. Handle source / destination schema differences

Option	What it does
Reorder CSV Columns During Extract	Reorders columns in the extracted file to match the target. Ignored if the target table doesn't exist.
Ignore Fields in Source Not in Target	Drops source columns the target doesn't have.
Alter Target Table If Source Has Extra Columns	Adds the extra columns to the Synapse / Fabric table.
Recreate Target Table If Source Has Extra Columns	Drops and recreates the target table. Destructive.
Insert NULL Into Target Fields Not in Source	Inserts NULL for any target column the source doesn't have.

Notifications on schema mismatch

If columns don't match, the flow logs an exception. See sending notifications about failed transformations.

Step 10. Configure mapping (optional)

If you need explicit field-to-field mapping, set it up in Mapping. Auto-mapping by field name works for most flows.

Incremental change replication (high watermark)

Change replication using HWM is supported. Set the High Watermark Field, enable Change Replication, and optionally configure a calculated HWM value.

HWM configuration

Load multiple tables with a wildcard

FROM

Create a single source-to-destination transformation and set FROM to a wildcard pattern, e.g. public.*.

Wildcard FROM

TO

Set TO to db.schema.*, e.g. dbo.history.*.

Wildcard TO

Include / exclude objects

Exclude Objects: comma-separated list of objects to skip.
Include Objects: comma-separated list of objects to keep.

Source query, before/after SQL

Use the {TABLE} token in Source Query, Before COPY SQL, and After COPY SQL.

MERGE with wildcards

Leave Lookup Fields empty and enable Predict Lookup Fields, or specify per-table lookup fields as semicolon-separated fully_qualified_table=field_list pairs.

HWM with wildcards

Use the {TABLE} token in the High Watermark Field Value.

Flow variables

Open the Parameters tab and add user-defined flow variables as key-value pairs. Use uppercase names without spaces.

Maximum number of parallel threads

Multiple source-to-destination transformations (extract-load) can run in parallel threads. The default limit is 5; the maximum is 99. Change it via Maximum Number of Parallel Threads on the Parameters tab.

Troubleshooting

See Common issues when loading data into cloud data warehouses for common errors and fixes.

Articles in this section