Overview
A source-to-destination transformation extracts data from a source, transforms it, and loads it into a destination.
An ETL Flow or Any-to-Any ETL Flow can contain multiple source-to-destination transformations.
Transformations typically execute in order but can run in parallel if enabled.
Add source-to-destination transformation
To add a transformation:
Step 1. Select a Flow type that starts with Source to Target (e.g., Web Service to Database).
Step 2. Click ➕ Add Transformation.
Step 3. Choose a Source Connection and Destination Connection.
Step 4. Enter or select a Source Name (FROM) and Destination Name (TO).
Step 5. If applicable, specify the Format (for files, APIs, and structured data).
For more details:
• Mapping a Source to a Destination
• Mapping Source Fields to Destination Fields
Defining the Source (FROM) and Destination (TO)
A transformation consists of a source (FROM) and a destination (TO). These can be:
- Relational databases → Tables, views, SQL queries
- File storage → Local, network, or cloud-based files (e.g., S3, SFTP)
- APIs/Web Services → JSON, XML, REST, SOAP endpoints
- Emails → Attached files
- Key-Value Stores → Redis, Memcached (keys)
- NoSQL Databases → MongoDB (document IDs)
Wildcard Support
- The source name can include a wildcard (e.g., database.schema.*).
- If the source uses a wildcard, you can use * in the destination (e.g., schema.* → target_schema.*).
Selecting FROM and TO Objects from a List
If the source and/or destination are connected to a live data source (for example database or API endpoint), the system automatically populates available tables, files, or endpoints.
Use the Metadata Selector to pick an object.
Manage transformations
Add, reorder, disable, or delete transformations.
Disabled transformations will be skipped.
Most transformations include per-field Mapping and additional parameters.
Source Query
A Source Query defines how the data is extracted from the source.
• It can run against databases, files, or web services.
• To configure, go to MAPPING > Source Query.
Executing SQL Scripts in Source Queries
If the source is a relational database, you can run SQL scripts (e.g., stored procedures, anonymous blocks).
• The script must return a cursor.
• Enable Execute as Script in Source Query settings.
Destination Query
For relational databases, the system automatically generates a destination query for each row from the source.
• Default actions: INSERT, UPDATE, DELETE, or MERGE.
• The query can be overridden manually.
Read how to override destination query.
Parameters
You can customize transformations by configuring parameters.
To open the Parameters
section while still on a MAPPING
screen, select one of the following tabs:
Parameters
Additional Transformations
Complex Transformations
Change Replication
On Exception
Flow Control
Groups of parameters
- Common parameters
- Parameters used to improve performance
- Parameters available only when processing source objects matching the wildcard name
- Parameters that control whether the transformation will be executed
- Parameters available only when the source is a file
- Parameters available only when the source is a database
- Parameters available only when the destination is a database
Common parameters
Transformation Name
Auto-generated or user-defined transformation name. Keep it empty to let the system automatically set the name of the transformation.
Linked Source Name
Allows one source to be linked to multiple destinations.
Performance Optimization Parameters
Stream data
• Reads and writes data in chunks, reducing memory usage.
• Disabled automatically if streaming is not supported for the specific source, for example nested JSON or XML document.
Use bind variables
• Improves database query performance.
• Batch Window → Default: 100 rows per batch.
Fetch size
• Controls the number of records fetched in one database call.
• Overrides Fetch Size at the Connection level.
Max records
• Limits the number of records extracted from the source.
• Overrides Max Records set the Connection level.
Parallel
• Runs transformation in parallel thread for faster performance.
• Disable if transformations depend on each other.
Wait for completion
Ensures all parallel transformations complete before the next step runs.
Force Streaming
By default, streaming is enabled, allowing the system to process data in chunks instead of loading everything into memory.
However, in some cases, Etlworks automatically disables streaming (e.g., when the source is XML, JSON, or a web service response with SQL filters).
• If streaming is disabled, all records are loaded into memory before processing begins.
• To force streaming, enable Force Streaming.
What happens when Force Streaming is enabled?
• The system creates a streamable view from the non-streamable source.
• Records stay in memory, but SQL statements are handled in smaller chunks.
• Bind variables can be used, improving database efficiency.
Wildcard Processing Parameters
Maximum Number of Files to Process
Limits the number of files when using wildcards.
Exclude/Include Objects
Defines comma-separated lists of tables, views or files to include/exclude when using wildcards. For tables and views use fully qualified named. Wildcard templates are supported.
Use parallel threads when processing sources by a wildcard
Allows multiple files/tables to be processed simultaneously.
Load data into the same destination sequentially
Ensures files for the same destination are processed in order.
Parameters that control whether the transformation will be executed
Ignore when there are no records
Ignore Transformation if No Records Exist in the source.
Parameters available only when the source is a file
Process all files
If multiple files match a wildcard, all matching files will be processed.
Ignore when there is no file
If no files exist, the transformation will be skipped.
Delete loaded sources files
Deletes the file after successful processing.
Deletes the file if an error occurs during processing.
Parameters available only when the source is a database
• Creates a new connection for each source.
• Enabling this option can improve performance when the database driver does not support parallel extract from different sources using the same database Connection. Example: MySQL.
Parameters available only when the destination is a database
Create new database Connection for destination
• Creates a separate database connection for each destination.
• In some cases it ensures better transaction handling and improves performance.
Target Table Creation & Matching
Defines how the system handles the destination table.
Available Options:
- Auto-Create & Match Target Table – Creates the table if it does not exist, adjusts column data types, and excludes extra columns in the source that do not exist in the destination. If ‘Alter Target Table’ is enabled, missing columns are added instead of excluded.
- Auto-Create Target Table, Fail on Mismatch – Creates the table if it does not exist. If it exists, the flow fails if the source has extra columns that do not exist in the destination (including mapping). Extra columns in the destination are allowed.
- Match Existing Target Table, Do Not Create – Uses the existing table and adjusts column data types but does not create a new table if it does not exist. Extra columns in the destination are allowed.
- Use Existing Target Table Without Modifications – Uses the existing table as-is. The table structure must match the source (including mapping); otherwise, the flow may fail.
- Auto-Create & Strict Schema Match, Fail on Mismatch - Creates the table if it does not exist. If it exists, the flow fails if the source does not match the destination exactly in terms of column names and number of columns. The order and data types do not matter, but if the source has fewer columns than the destination or mismatched column names, the flow fails.
- Strict Schema Match, Fail on Mismatch, Do not create - Fails if the table does not exist. If it exists, the flow fails if the source does not match the destination exactly in terms of column names and number of columns. The order and data types do not matter, but if the source has fewer columns than the destination or mismatched column names, the flow fails.
- True (Legacy – Auto-Create & Match Target Table) – Equivalent to Auto-Create & Match Target Table (for backward compatibility).
- False (Legacy – No Auto-Creation or Matching) – Disables automatic table creation and column matching (for backward compatibility).
Create temporary table
If you need to create a temporary table, enable Create temporary table
.
A common use case for this option is loading data into a staging table before applying SQL transformations to populate the actual table.
Create table SQL
Allows programmatic customization of the automatically generated CREATE TABLE SQL.
By default system automatically generates an SQL for creating a new table if it does not exist. You can override the default SQL by providing a template, for example:
create global temporary table {TABLE} ({COLUMNS}})
Available tokens: {TABLE}
and {COLUMNS}
.
You can also write a program in JavaScript to dynamically generate the create table SQL, for example:
value = 'create global temporary table ' + table + '(' + columns + ', primary key id)';
Available variables:
- etlConfig
- scenario
- dataSet
- destination
- table: the table name
- columns: the comma-separated list of columns
Create indexes
If enabled, copies indexes from the source table to the destination.
Keep order of columns in mapping when creating table
By default, when a flow creates a table and mapping is defined, the system maintains the original order of columns in the source and adds new columns at the end, ignoring the order in the mapping.
In most cases it is good enough (order of columns typically doesn't matter in SQL) but if for any reason you want to keep the order of columns as defined in mapping just enable the Keep order of columns in mapping when creating table
.
Alter target table if the source has columns that the target table doesn't have
If the source has additional columns, executes ALTER TABLE to add them.
Recreate target table if the source has columns that the target table doesn't have
If the source structure has changed, drops and recreates the destination table.
This parameter is not available if High Watermark Change Replication
is enabled.
Log DDL SQL statements
If you want to log CREATE and ALTER TABLE DDL statements generated and executed by the flow enable option Log DDL SQL statements
.
Before SQL
The SQL to be executed on the destination Connection before the extract. Optionally, you can configure the Flow to Ignore Errors
that occurred when executing the Before SQL
.
After SQL
The SQL to be executed on the destination Connection after the load. Optionally, you can configure the Flow to Ignore Errors
that occurred when executing the After SQL
.
Use native SQL (deprecated)
Generates SQL using native SQL dialect: PL/SQL for Oracle, T-SQL for SQL Server, etc.
This option is deprecated but kept for backward compatibility.
Action
SQL Actions for Destination: Insert, Update, Delete, Merge.
IMERGE (UPSERT) Support for Oracle, SQL Server, PostgreSQL, MySQL, DB2, Informix, Sybase.
Read more about MERGE
(UPSERT) and IfExist
actions.
The Action parameter defines the SQL operation that will be executed for each record or, in some cases, for the entire dataset. The available actions control how records are inserted, updated, deleted, merged, or conditionally processed in the destination table.
Available Actions and Their Descriptions:
-
INSERT (Default): Executes a standard INSERT INTO TABLE statement for each record.
-
UPDATE: Executes an UPDATE TABLE SET ... WHERE statement for each record.
-
DELETE: Executes a DELETE FROM TABLE WHERE statement for each record.
-
MERGE (UPSERT): Executes a MERGE INTO TABLE USING ... ON statement if the destination database supports native MERGE.
-
If a record exists, it updates the existing record.
-
If a record does not exist, it inserts a new record.
-
-
BULK MERGE: A two-step process where:
-
Data is first inserted into a temporary table.
-
A MERGE statement merges data from the temporary table into the actual table.
-
Best for: Large datasets where MERGE performance needs to be optimized.
-
BULK DELETE/INSERT: Similar to BULK MERGE, but instead of MERGE, it:
-
Deletes all records that match existing data.
-
Inserts new records.
-
-
Conditional: Uses JavaScript conditions to determine whether to INSERT, UPDATE, DELETE, or MERGE for each record. Read about Action Conditions.
-
IfExist: MERGE for databases which don't support native MERGE:
-
Executes a SELECT statement to check if the record exists.
-
If the record exists, it performs an UPDATE.
-
If the record exists, it performs an UPDATE.
-
Limitations: Much slower than MERGE, as it requires two SQL queries per record (one SELECT, one INSERT/UPDATE).
-
Record: Determines the action based on a flag stored in the record, or the value of debezium_cdc_op (if streaming from real-time sources like message queues).
• c or r → INSERT
• u → UPDATE
• d → DELETE
-
Record with MERGE: Similar to Record, but instead of executing INSERT/UPDATE/DELETE separately, it always executes a MERGE INTO TABLE statement.
-
Record with IfExist: Similar to Record with MERGE, but instead of executing a MERGE, it:
1. Executes a SELECT query to check if the record exists.
2. If the record exists, it performs an UPDATE.
3. If the record does not exist, it performs an INSERT.
-
CUSTOM: Executes a user-defined SQL statement instead of an automatically generated one.
Lookup Fields
Lookup Fields are used to uniquely identify records in the destination table when performing UPDATE, DELETE, MERGE, or IfExist actions.
You can specify a comma-separated list of field names that will be used as matching keys in the destination table.
Important Notes:
- Field names are case-sensitive and must match exactly as they appear in the database.
- If using MERGE in PostgreSQL or MySQL, the Lookup Fields must be part of a unique index for the query to succeed.
- For other databases, a unique index is optional but improves performance.
Predict Lookup Fields
If Predict Lookup Fields is enabled and no Lookup Fields are manually specified, the system will attempt to determine them automatically.
Prediction Algorithm:
1. Check for a primary key or unique index in the destination table.
• If the destination database supports indexes and the table exists, the system tries to extract the primary key or the first unique index.
2. If no index is found, check the source table.
• The system searches for an equivalent unique index or primary key in the source table and maps it to the destination.
3. If no primary key is found, use name-matching heuristics.
• The system looks for columns that contain the table name, e.g.:
• If the table name is patient, the system will search for columns like:
- patient_id
- patientid
- patient_key
4. As a last resort, use the first column in the destination table.
If none of the above methods work, the system selects the first column in the destination as the Lookup Field.
Important Considerations:
- The prediction method is not always accurate and should be reviewed.
- If the Lookup Fields are incorrect, the update or merge operations may fail or produce incorrect results.
- This method is particularly useful when processing tables with wildcard names, where the transformation must adapt dynamically.
Alternative to Predict Lookup Fields when using a wildcard processing
When processing multiple tables using wildcards, the Predict Lookup Fields option may not always produce the correct results.
Instead, you can manually specify a mapping of tables to lookup fields using the following format:
table_name=field1,field2;table_name=field1,field2;
Example:
test1.inventory=inventory_id,database_name;
test1.payment=payment_id,database_name;
test1.rental=rental_id,database_name;
Action Conditions
Action Conditions allow you to define JavaScript-based rules to dynamically determine which SQL action (INSERT, UPDATE, DELETE, or MERGE) should be executed for each record.
This option is used when “Conditional” is selected as the Action in the transformation settings.
Read how to configure a conditional action.
Change replication
In Etlworks, a flow can be configured to track changes in a data source (such as a database or an API) and load only the changed records into the destination. This process is also known as change replication or data synchronization.
Etlworks supports unidirectional change replication (one-way synchronization) from the source to the destination. To replicate changes in the opposite direction (from destination to source), you must create a separate Flow where the source and destination are swapped.
Read more about configuring change replication in Etlworks.
High Watermark Field
The High Watermark Field is used to track the last successfully processed record in change replication.
Enable Change Replication
To enable change replication, you must:
• Enable the Change Replication option in the flow settings.
• Specify a High Watermark Field, which determines what column is used to track changes.
Without these settings, the flow will process all records every time it runs, instead of just the changes.
High Watermark Field Value on First Run
The first time a change replication flow runs, it needs an initial High Watermark value to determine where to start.
If the High Watermark Field is a TIMESTAMP, the value should be formatted as:
YYYY-MM-DD HH:mm:SS.mil
Example:
2018-08-01 19:38:31.377
This starting value is used only once—subsequent runs will update the High Watermark automatically.
High Watermark Field Value
To make change replication more reliable, you can use a SQL query to dynamically determine the High Watermark value based on the destination table.
How it works:
• Instead of using a fixed value, the query calculates the last known state in the destination.
• This ensures that the flow only pulls records that do not exist in the destination or have changed since the last run.
Read about calculating High Watermark.
Handling exceptions
By default, when an error occurs during flow execution, the process halts immediately to prevent further issues. However, in certain cases, you may want the flow to continue execution despite errors or handle specific exceptions differently.
On Exception
• If an error occurs, the default behavior is to stop execution and return an error.
• When the Ignore on Exception option is enabled, the flow skips the error and continues processing.
• This setting is useful for non-critical errors, where partial data processing is acceptable.
Read how to configure the Flow to ignore all or specific exceptions.
Exception Mask
The Exception Mask allows you to define specific errors that should be ignored, while still stopping execution for other errors.
How it works:
• Enter a part of or the full exception message to be ignored.
• If an error occurs and its message matches the exception mask, the flow will continue running instead of stopping.
Example:
If the Exception Mask is set to:
Timeout
Any error message that contains “Timeout” will be ignored, while other errors will still cause execution to stop.
Rollback on Exception
Some databases (e.g., PostgreSQL) require a transaction rollback if an exception occurs.
If Rollback on Exception is enabled, the system automatically rolls back the transaction before continuing or stopping execution.
This ensures data integrity by preventing partially committed transactions.
Comments
0 comments
Please sign in to leave a comment.