Use this flow to extract data from any supported source, transform it, and load it into Azure Synapse Analytics or Microsoft Fabric Warehouse. Etlworks generates and runs the COPY INTO command for you, optionally followed by a MERGE.
Note: One article, two products. Etlworks connections and flows optimized for Azure Synapse Analytics are fully compatible with Microsoft Fabric Warehouse. A few configuration steps differ — called out below.
Which Synapse / Fabric flow should I use?
| Flow | Use when |
|---|---|
|
Any to Synapse Analytics / Fabric Warehouse (this article) Database / File / Queue / Web service / Well-known API → Synapse or Fabric |
You need to extract from any source, optionally transform, and load. |
| Bulk load files in Azure Storage into Synapse / Fabric | The files already exist in Azure Blob. No transformation needed. Auto-generates COPY INTO; supports MERGE. |
| Stream CDC events into Synapse / Fabric | You need real-time replication from a CDC-enabled source database. |
| Streaming with message queues | You need real-time ingestion from a message queue that supports streaming. |
Azure Synapse Analytics is built on Microsoft SQL Server. Row-by-row inserts are very slow; the flows above use COPY INTO from an Azure Storage stage, which is the supported high-performance load path.
How does this flow work?
- Extracts data from the source.
- Writes the data to CSV or Parquet files.
- Optionally compresses files with gzip (CSV only).
- Uploads files into an Azure storage container that serves as the Synapse / Fabric stage.
- Creates the destination table from the source metadata if it doesn't exist.
- Generates and runs the COPY INTO command.
- If MERGE is configured, merges the staged data with the existing target table.
- Cleans up staged files.
What do I need for Synapse Analytics?
- Synapse Analytics is available from your Etlworks instance.
- The Synapse user executing COPY INTO has both ADMINISTER DATABASE BULK OPERATIONS and INSERT permissions.
- An Azure Storage account exists and the user has read/write permissions on the container.
What do I need for Microsoft Fabric Warehouse?
- The Microsoft Fabric Warehouse exists and is reachable.
- An Azure Storage Gen2 account exists, the user has read/write permissions on the container, and the account is linked to PowerBI and the Fabric workspace (see steps 8-11 of the get-started guide).
Step-by-step setup
Step 1. Create connections
- A source connection.
- An Azure Storage connection for the stage. Set Archive file before copying to GZip. Important: disable gzip compression if you're loading with the Parquet format.
- An Azure Synapse Analytics connection or a Microsoft Fabric Warehouse connection.
Step 2. Create a CSV or Parquet format
Synapse / Fabric can load CSV or Parquet.
For CSV:
- Enable Convert empty string to null.
- Leave Value for null empty.
For Parquet:
- Set Serialize dates as to Milliseconds from epoch.
- Optionally set Compression Codec to GZIP or SNAPPY.
Step 3. Create the flow
Open Flows, click +, type synapse in the gallery search, and pick the flow type that matches your source.
Add the source-to-destination transformation and continue with the steps below.
Step 4. Set the Synapse / Fabric connection on the flow
Open the Connections tab and pick the Synapse Analytics or Fabric Warehouse connection from Step 1.
Step 5. Set source (FROM) and destination (TO)
FROM (varies by flow type):
- Database: table or view name. Wildcards supported, e.g. public.*.
- Well-known API: endpoint or file name.
- Web service: endpoint name.
- File: file name or wildcard pattern.
- Queue: queue name.
TO: the fully qualified Synapse / Fabric table name. If FROM uses a wildcard, use a matching wildcard in TO.
The Azure Storage connection is the destination connection on the transformation. Etlworks treats it as the stage.
Step 6. Set the data exchange format
Select the CSV or Parquet format you created in Step 2 as the destination format.
Step 7. Set parameters (optional)
Click MAPPING on the transformation row, then go to the Parameters tab.
COPY INTO behavior
| Parameter | What it does |
|---|---|
| Column List | Comma-separated list of target columns. Lets you load source fields into specific target columns. Columns can be in any order in the COPY statement, but file order must match. |
| Line Separator | Defaults to \n on Windows or 0x0a on Linux. Override here if CSV files were generated on a different OS than the load runs on. |
| Max # of Errors to Ignore | MAXERRORS. Maximum reject rows before COPY is canceled. Default 0. |
| Error File Directory | Directory (relative to the container) where rejected rows and error files are written. If it doesn't exist, Etlworks creates it; a rejectedrows child directory is created automatically. |
| Identity Insert | Whether identity values in the source file are used for the identity column. Default OFF: Synapse auto-assigns values; if a column list is passed it must not map an input to the identity column. ON: source values are used; if a column list is passed it must map an input to the identity column. Default values aren't supported for identity columns in the column list. |
| Date Format | One of mdy, dmy, ymd, ydm, myd, dym. Maps to SQL Server date formats. |
| Action | COPY INTO inserts new rows; MERGE upserts; CDC MERGE applies CDC change events. |
| How to MERGE | DELETE/INSERT (default) deletes matching rows then inserts everything from the temp table. Native MERGE executes Synapse native MERGE SQL — requires the table to be created with DISTRIBUTION HASH. |
| Create tables with DISTRIBUTION = HASH | Creates target tables as CREATE TABLE (...) WITH DISTRIBUTION = HASH(PK). Required for Native MERGE. |
| Lookup Fields | Comma-separated list of columns that uniquely identify a record. Required when Action is MERGE or CDC MERGE. |
| COPY INTO SQL | User-defined COPY INTO SQL. Etlworks generates one automatically by default. |
| MERGE SQL | User-defined MERGE SQL. Tokens: {TABLE}, {TEMP_TABLE}, {KEY_FIELDS}, {FIELDS}. |
| CDC MERGE SQL | User-defined CDC-MERGE SQL. Tokens: {TABLE}, {TEMP_TABLE}, {KEY_FIELDS}, {FIELDS}, {UPDATE_FIELDS}, {UPDATE_CONDITIONS}. |
| Purge File if Success | On by default (when not using user-defined SQL). Deletes the staged file after a successful COPY. |
| Purge File(s) if Error | On by default. Deletes staged files on COPY failure. |
ELT — SQL before and after COPY
| Parameter | What it does |
|---|---|
| Before COPY SQL / Ignore errors / is a script | SQL to run on the Synapse / Fabric connection before the COPY. Optional error suppression and full-script mode. |
| After COPY SQL / Ignore errors / is a script | Same three options, but run after the COPY. |
Debugging
| Parameter | What it does |
|---|---|
| Log each executed SQL statement | Logs every auto-generated and executed SQL statement. |
Source-specific parameters
- Database source: see Extract, Transform, and Load (ETL) data for source-database options.
-
File or web-service source:
- Ignore Transformations: copies files directly to the Azure stage. Much faster, but you lose any transformation steps.
- Delete loaded source files: deletes the source file after a successful load.
- Delete source files on error: deletes the source file if the load fails.
- Well-known API source: see the well-known API options.
Step 8. Configure MERGE (UPSERT) (optional)
- Set Action to MERGE.
- Provide Lookup Fields, or enable Predict Lookup Fields for auto-detection (less reliable).
For wildcard sources that target multiple tables, specify per-table lookup fields with fully qualified names, separated by semicolons:
test1.inventory=inventory_id,database_name; test1.payment=payment_id,database_name; test1.rental=rental_id,database_name;
Step 9. Handle source / destination schema differences
| Option | What it does |
|---|---|
| Reorder CSV Columns During Extract | Reorders columns in the extracted file to match the target. Ignored if the target table doesn't exist. |
| Ignore Fields in Source Not in Target | Drops source columns the target doesn't have. |
| Alter Target Table If Source Has Extra Columns | Adds the extra columns to the Synapse / Fabric table. |
| Recreate Target Table If Source Has Extra Columns | Drops and recreates the target table. Destructive. |
| Insert NULL Into Target Fields Not in Source | Inserts NULL for any target column the source doesn't have. |
Notifications on schema mismatch
If columns don't match, the flow logs an exception. See sending notifications about failed transformations.
Step 10. Configure mapping (optional)
If you need explicit field-to-field mapping, set it up in Mapping. Auto-mapping by field name works for most flows.
Incremental change replication (high watermark)
Change replication using HWM is supported. Set the High Watermark Field, enable Change Replication, and optionally configure a calculated HWM value.
Load multiple tables with a wildcard
FROM
Create a single source-to-destination transformation and set FROM to a wildcard pattern, e.g. public.*.
TO
Set TO to db.schema.*, e.g. dbo.history.*.
Include / exclude objects
- Exclude Objects: comma-separated list of objects to skip.
- Include Objects: comma-separated list of objects to keep.
Source query, before/after SQL
Use the {TABLE} token in Source Query, Before COPY SQL, and After COPY SQL.
MERGE with wildcards
Leave Lookup Fields empty and enable Predict Lookup Fields, or specify per-table lookup fields as semicolon-separated fully_qualified_table=field_list pairs.
HWM with wildcards
Use the {TABLE} token in the High Watermark Field Value.
Flow variables
Open the Parameters tab and add user-defined flow variables as key-value pairs. Use uppercase names without spaces.
Maximum number of parallel threads
Multiple source-to-destination transformations (extract-load) can run in parallel threads. The default limit is 5; the maximum is 99. Change it via Maximum Number of Parallel Threads on the Parameters tab.
Troubleshooting
See Common issues when loading data into cloud data warehouses for common errors and fixes.