Overview

High Watermark (HWM) replication is the most common way to implement incremental data loading in Etlworks.

It allows a Flow to load only records that have changed since the last successful run, instead of reloading the entire dataset every time.

HWM replication is unidirectional:

Data always flows from source to destination
Reverse replication requires a separate Flow with source and destination swapped

This technique works with:

Databases
Files
APIs
Any supported source and destination combination

The core idea is simple:

Etlworks remembers the highest value of a specific field from the last successful run and uses it to filter out older records in the next run.

That field is called the High Watermark Field.

If you are looking for other change replication methods available in Etlworks, see Change Replication Techniques.

What Problem HWM Replication Solves

Without HWM replication:

Every run reloads all data
Large datasets become slow and expensive
Reprocessing unchanged records increases risk and cost

With HWM replication:

Only new or updated records are extracted
Runs are faster and predictable
The process is safe to run repeatedly

Key Concepts You Must Understand

Before configuring HWM replication, it is important to understand these rules. Most user confusion comes from missing one of them.

1. HWM Is Tracked Per Transformation

Each source to destination transformation has its own HWM value.

HWM is not global to the Flow
HWM is not shared between transformations
HWM is stored using the transformation name as a key

This is why transformation naming is critical and discussed later.

2. A Valid High Watermark Field Is Required

HWM replication is possible only if the source contains a suitable High Watermark Field.

The field must:

Exist in the extracted data
Be of type DATE, TIMESTAMP, or NUMERIC (or convertible to one)
Represent record creation or modification order

If no suitable field exists in the source, HWM replication cannot be used.

3. HWM Is Updated Only After a Successful Run

Etlworks updates the stored HWM value only when the transformation finishes successfully.

If a run fails:

The HWM value is not updated
The next run will retry the same records

This guarantees no data loss.

The High Watermark Field

What Is a High Watermark Field

A High Watermark Field is the field Etlworks uses to determine whether a record is new or updated.

It represents a value that:

Always increases over time
Reflects record creation or modification
Can be compared across runs

Supported Data Types

The High Watermark Field must be one of the following:

TIMESTAMP
DATE
NUMERIC

If the value is not stored in one of these formats, it must be converted.

Recommended Characteristics

For reliable replication, the field should:

Be updated on every insert or update
Have an index if the source is a database
Never decrease in value

Common examples:

last_updated
modified_at
updated_on
sequence_id
audit_id

Using Fully Qualified Field Names

When the source query includes joins, subqueries, or aliases, field name collisions may occur.

In such cases, use a fully qualified field name for HWM field:

schema.table.column

Example:

public.audit_trail.last_modified

This avoids ambiguity and ensures correct filtering.

Using Multiple Fields as a High Watermark

Only one field can be configured as the High Watermark Field.

If multiple fields are required, combine them in the Source Query.

Example:

SELECT *,
COALESCE(updated_at, created_at) AS hwm_field
FROM orders

Then configure:

High Watermark Field = hwm_field

How HWM Replication Works Internally

This section explains exactly what Etlworks does.

First Successful Run

Etlworks extracts records from the source
It loads them into the destination
It calculates the maximum value of the High Watermark Field
That value is stored in metadata storage

At this point, the Flow has a baseline HWM.

Subsequent Runs

Etlworks retrieves the previously stored HWM value
It filters out records that are not newer than that value
Only new or updated records are processed
A new HWM value is calculated and stored

How Filtering Is Applied

Database source

Etlworks adds (or modifies) the WHERE clause dynamically:

SELECT *
FROM audit_trail
WHERE last_modified > ?

File or API source

Etlworks applies a JavaScript filter that skips older records.

Configuring HWM Change Replication

Step 1. Create the Flow

Create a Flow and add a source to destination transformation.

Step 2. Enable Change Replication

Open the transformation
Click CONFIGURE
Open the Change Replication tab
Enter the High Watermark Field

Step 3. Optional but Recommended: Calculate HWM on the Fly

This step is critical in environments where:

Destination tables already contain data
Destination data may be modified outside the Flow
You want protection against manual changes

Details are explained here.

Step 4. Handle First-Run Edge Case

If you calculate HWM using a destination query and the destination table does not exist yet:

Enable ignoring HWM query exceptions
Otherwise the first run will fail

Step 5. Configure Destination Action and Lookup Fields (Relational Databases Only)

If the destination is a relational database, configure how changed records are applied to the destination table.

Go to Transformation->Configure > Parameters, select the appropriate Action (for example, MERGE), and define the required Lookup Field(s).

For details read:

Applying Changes to a Database Destination (Insert and Update)

When the destination is a database, HWM replication controls what records are extracted from the source. It does not automatically decide how to apply those records to an existing destination table.

You must configure the destination write behavior using the Action parameter.

What HWM Replication Supports

HWM replication supports:

Inserts (new records)
Updates (existing records)

HWM replication does not support:

Deletes

If you must replicate deletes, use a change-log based technique:

Choosing the Correct SQL Action

Default Behavior (Insert Only)

By default, Etlworks inserts records into the destination table.

This works only when one of the following is true:

The destination table is empty on the first run and never receives duplicates later, or
The destination table can safely accept duplicates, or
You are loading into a staging table and merging later outside Etlworks

If the destination already contains data and you want to keep it in sync, you typically need an upsert strategy.

Where to Configure the Action

Go to:

Configure > Parameters > Action

Available actions include:

INSERT (default)
UPDATE
MERGE (UPSERT)
BULK MERGE
BULK DELETE/INSERT
IfExist
CUSTOM
Conditional

Most of these require configuring lookup fields.

Recommended Actions for HWM Replication

Option 1 (Preferred): MERGE (Upsert)

MERGE updates existing rows and inserts missing rows.

MERGE is natively supported for these databases:

Oracle
Microsoft SQL Server
PostgreSQL
MySQL
DB2
Informix
Sybase

When to use MERGE:

You want one action that handles inserts and updates
The destination is expected to already contain some data
You want idempotent behavior across reruns

What MERGE does not do:

It does not delete rows that disappeared from the source
It does not detect conflicts where destination was modified independently. Source wins.

Option 2: BULK MERGE and BULK DELETE/INSERT

These are bulk strategies used primarily for performance at scale.

General behavior:

BULK MERGE: a bulk upsert strategy
BULK DELETE/INSERT: delete matching rows first, then insert the batch

Important notes for HWM replication:

These options still do not replicate deletes from the source.
BULK DELETE/INSERT can cause temporary gaps, extra logging, or lock amplification depending on the database and transaction settings.
Use them when performance matters more than per-row granularity.

Option 3: IfExist (Insert or Update)

IfExist is an alternative to MERGE.

Behavior:

Etlworks checks if the record exists using lookup fields
If it exists, Etlworks executes UPDATE
Otherwise, it executes INSERT

When to use IfExist:

MERGE is not supported by the destination database or driver
You want explicit insert/update logic
You want simpler SQL generated by the connector

Tradeoff:

It is much slower than MERGE for large batches because it requires existence checks

Option 4: CUSTOM (Custom Upsert Template)

Some connectors support database-specific upsert clauses, but syntax differs by driver and database.

With CUSTOM, you provide an upsert template, and Etlworks generates the runtime SQL.

When to use CUSTOM:

Destination supports upsert but not via standard MERGE
You need vendor-specific syntax (for example, special UPSERT or INSERT ON CONFLICT behavior)
You want full control over the statement

Option 5: UPDATE Only

UPDATE updates existing rows only.

When to use UPDATE:

You are certain all rows already exist in the destination
Inserts are handled separately
You are applying corrections only

If rows do not exist, they will not be created.

Conditional Action

Conditional action allows executing an action only when a JavaScript condition is true.

Use this when:

You want to route certain rows to different behavior
You need conditional update logic
You want to skip writes for certain records

This still requires lookup fields if any update-like action is used.

What This Means for HWM Replication

HWM controls extraction like this:

“Get records newer than the last stored HWM value”

Action controls how those extracted records are applied:

“Insert them”
“Update them”
“Insert or update them”

For typical incremental synchronization into a database table:

Use MERGE (preferred) or IfExist
Configure Lookup Field(s) to match the destination unique key
Do not expect delete replication. Use CDC or CT if deletes matter.

Lookup Fields (How Etlworks Identifies Existing Records)

Lookup fields define how Etlworks finds an existing row in the destination table when performing updates or upserts.

They are required for all actions that modify existing data, including:

UPDATE
MERGE (UPSERT)
BULK MERGE
BULK DELETE/INSERT
IfExist
CUSTOM
Conditional actions that perform updates

Without lookup fields, Etlworks has no reliable way to determine whether a record already exists.

What Lookup Fields Represent

Lookup fields must uniquely identify a single row in the destination table.

In most cases, lookup fields correspond to:

A primary key, or
A unique constraint, or
A natural business key (for example, order_id + source_system)

If lookup fields do not uniquely identify a row, Etlworks may:

Update multiple rows
Update the wrong row
Appear to behave inconsistently across runs

Rule of thumb:

Lookup fields should match a uniqueness guarantee in the destination database.

Simple Lookup Field Configuration

For a single destination table, lookup fields are usually specified as a comma-separated list of column names:

order_id

Or composite keys:

order_id, source_system

These column names must exist in the destination table and be present in the data being loaded.

Predict Lookup Fields (Automatic Detection)

Etlworks provides an option to Predict Lookup Fields.

When enabled, Etlworks attempts to automatically determine which columns uniquely identify a row using heuristics (first) and AI (if heuristics fail).

Important notes:

This is a convenience feature, not a guarantee
Automatic detection may be incorrect
It should be reviewed carefully before running production Flows

Note: Predict Lookup Fields and manually configured lookup fields can be enabled at the same time. Etlworks always applies manually defined lookup fields first and uses the predictive algorithm only as a fallback when no manual rules are provided.

Advanced Lookup Fields Configuration (Multiple Tables or Wildcards)

When working with:

Wildcard sources
Multiple destination tables
Dynamically generated transformations

You can explicitly map lookup fields per table.

Format:

table1=field1,field2;
table2=field1,field2;

Example:

public.inventory=inventory_id,database_name;
public.payment=payment_id,database_name;
public.rental=rental_id,database_name;

Rules:

Use fully qualified table names
Separate table mappings with semicolons
Fields listed after = form the composite lookup key for that table

This approach is strongly recommended for wildcard-based HWM replication when correct lookup fields cannot be reliable predicted.

Lookup Fields and HWM Replication

HWM replication controls which records are extracted.

Lookup fields control how those records are applied to the destination.

Both must be configured correctly for reliable incremental synchronization.

Common misconfiguration patterns:

HWM field is correct, but lookup fields are missing → duplicates
Lookup fields exist, but are not unique → incorrect updates
Lookup fields reference columns not present in the destination → runtime failures

Key Takeaway

Lookup fields are not optional metadata.

They are the mechanism that turns “changed records” into correct inserts or updates.

If HWM defines when a record should be processed, lookup fields define where it belongs in the destination table.

Getting them right is essential for correct HWM replication behavior.

Calculating High Watermark Value on the Fly Using SQL

Why This Matters

Automatic HWM tracking assumes:

Only this Flow modifies the destination table

If that assumption is false, HWM may drift or skip records.

Calculating HWM from the destination table fixes this.

IMPORTANT: this only works when destination is a relational database or cloud data warehouse, such as Snowflake.

How It Works

Provide an SQL query in High Watermark Field Value under Transformation > Configure > Change Replication that returns the current maximum HWM value from the destination.

Example:

SELECT MAX(last_updated) FROM orders

This query:

Runs against the destination connection
Overrides the internally tracked HWM value

Using the {TABLE} Token

When working with wildcard sources, use {TABLE} token:

SELECT MAX(id) FROM {TABLE}

Etlworks replaces {TABLE} with the actual destination table name.

Ignoring Exceptions When Executing HWM Query

The Edge Case

If:

The destination table is created automatically
The HWM query runs before the table exists

Then the Flow fails.

Solution

Set High Watermark Exception to Ignore.

Example value:

object does not exist

If the exception message contains this text:

HWM value is set to NULL
The Flow continues normally

Calculating HWM Value Using JavaScript

When to Use JavaScript

Use JavaScript when:

HWM value needs transformation
You need custom logic
Numeric or timestamp normalization is required

Provide a JavaScript expression in Change High Watermark Field Value under Transformation > Configure > Change Replication that returns the current maximum HWM value from the destination.

Example

highWatermark == null ? 0 : java.lang.Math.round(highWatermark)

Rules

The last evaluated expression is returned
NULL handling is your responsibility

Available Variables

You can reference:

dataSet
etlConfig
scenario
source
destination

These allow advanced logic when required.

Integrating High Watermark Logic into Custom Source Queries

Why This Section Exists

The most common and simplest HWM scenario is:

Source table is replicated as-is
Etlworks automatically adds the HWM filter
No custom SQL is involved

This is the happy path.

However, many real-world Flows do not extract data as-is.

Instead, users often define a custom source query to:

Join multiple tables
Filter records
Transform fields
Derive new columns
Normalize or denormalize data

Once you provide a custom source query, Etlworks can no longer safely modify it automatically.

At that point, you are responsible for explicitly integrating HWM logic into the query.

This section explains how to do that correctly.

The Core Rule

If a source query is explicitly defined, Etlworks does not automatically inject HWM filtering.

You must reference the previous High Watermark value manually inside the query.

Failing to do so means:

All records are extracted on every run
HWM is effectively ignored
Incremental replication does not occur

Where the Previous HWM Value Comes From

After each successful run, Etlworks stores the last calculated HWM value in metadata storage.

That value can be referenced using a token with the following format:

{TRANSFORMATION_NAME_HIGH_WATERMARK}

Example:

{ORDERS_HIGH_WATERMARK}

The token always resolves to:

The last stored HWM value, or
NULL if the Flow has never completed successfully

Bind Variables vs String Substitution

When a custom source query references the HWM token, Etlworks supports two ways of passing the value into the query:

String substitution (default, backward-compatible behavior)
Bind variables (optional, explicit behavior)

This behavior is controlled by the checkbox:

Use Bind Variables for High Watermark Values

When the option is disabled (default)

The HWM token is replaced directly in the SQL text
This preserves backward compatibility
This mode is required when the HWM value represents:
- A list of IDs
- A dynamically generated SQL fragment
- Any non-scalar expression

Example use case:

WHERE order_id IN ({ORDERS_HIGH_WATERMARK})

at runtime replaced with:

WHERE order_id IN (1,2,3)

When the option is enabled

The HWM value is passed as a bind variable
This is safer and usually faster
This mode requires the HWM value to be a single scalar value (DATE, TIMESTAMP, or NUMERIC)

Example use case:

WHERE updated_at > {ORDER_HIGH_WATERMARK}

at runtime replaced with:

WHERE updated_at > ?

and bind variable value is set to the value of HWM.

Use this option when your HWM represents a single value and you want better performance and predictable SQL execution.

Handling the First Run Correctly (NULL HWM)

On the first run, there is no stored HWM value.

If your query does not account for this, it will often return zero rows or fail.

To handle this safely, always use the following pattern:

{TOKEN} IS NULL OR hwm_field > {TOKEN}

Example:

{ORDERS_HIGH_WATERMARK} IS NULL OR hwm_field > {ORDERS_HIGH_WATERMARK}

This guarantees that:

First run loads all data
Subsequent runs load only changes

IMPORTANT: pattern above works when Use Bind Variables for High Watermark Values is enabled.

Example: Adding HWM to an Existing Source Query

Original Source Query (No HWM)

SELECT o.id,
o.customer_id,
o.updated_at,
c.country
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'ACTIVE'

This query works, but it always reloads all matching records.

Updated Source Query with HWM Logic

Assumptions:

Transformation Name: ORDERS
High Watermark Field: o.updated_at
Use Bind Variables for High Watermark Values is enabled

SELECT o.id,
o.customer_id,
o.updated_at,
c.country
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'ACTIVE'
AND (
{ORDERS_HIGH_WATERMARK} IS NULL
OR o.updated_at > {ORDERS_HIGH_WATERMARK})

Now the query behaves correctly:

First run loads all active orders
Subsequent runs load only newly updated orders

Using HWM in Queries with Derived Fields

If the HWM field is derived inside the query, reference the derived value consistently.

Example:

SELECT *,
COALESCE(updated_at, created_at) AS hwm_field
FROM orders
WHERE (
{ORDERS_HIGH_WATERMARK} IS NULL
OR COALESCE(updated_at, created_at) > {ORDERS_HIGH_WATERMARK}
)

The High Watermark Field must be set to:

hwm_field

Using the Previous HWM Value in HWM Calculation Queries

The same token can be used inside the HWM calculation query itself.

This is useful when you want to calculate the new HWM only from newly processed data, not from the entire destination table.

Example:

SELECT MAX(updated_at)
FROM orders_archive
WHERE (
{ORDERS_HIGH_WATERMARK} IS NULL
OR updated_at > {ORDERS_HIGH_WATERMARK}
)

This ensures the HWM advances only when new data is actually processed.

Common Mistakes to Avoid

Forgetting to add HWM logic to a custom source query
Using the HWM token without handling NULL
Referencing a field name that does not match the configured HWM field
Changing the transformation name after HWM values are already stored

Any of these will cause incorrect or confusing behavior.

Key Takeaway

Once you define a custom source query:

Etlworks stops modifying it automatically
HWM becomes explicit, not implicit
You must integrate HWM logic yourself

If done correctly, HWM replication works exactly the same as in the happy path scenario, just with full control over extraction logic.

If not, incremental replication will silently stop working.

Manually Setting the High Watermark Value

There are rare cases where you must control the initial or current HWM.

High Watermark Field Value on First Run

Used only once
Ignored on subsequent runs
Useful when destination already contains historical data

Timestamp format:

YYYY-MM-DD HH:mm:SS.mil

High Watermark Field Value

Overrides all automatic behavior
Can be a literal value or SQL query
Applies to every run

Use with caution.

HWM Replication with Wildcard Sources

Wildcard sources create multiple transformations dynamically.

Critical Rule

Transformation names must be stable.

Configuration

1. Use wildcard in source

2. Open Parameters tab

3. Set Transformation Name to:

{source}

This ensures:

Each source object gets its own HWM
Stored values can be located correctly

Source Queries in Wildcard Transformations

Use {TABLE} in Source Queries
Use {HWM_TOKEN} to reference Previous HWM value

Example:

select * from {table} 
where {HWM_TOKEN} IS NULL or last_update > {HWM_TOKEN}

Other Notes

All source objects must share the same HWM field
Use Bind Variables for High Watermark Values must be enabled
Use {TABLE} in HWM SQL queries: select max(last_update) from {table}

Saving HWM Only on Full Success

The Problem

In nested flows:

Partial success may still update HWM
This can skip records unintentionally

Solution

Enable Save metrics only on success under Flow->Parameters

This ensures:

HWM is saved only if all steps succeed
Partial runs do not advance HWM

Incremental Replication for Non-Database Sources

HWM replication is not limited to databases.

Requirements:

High Watermark Field must be DATE, TIMESTAMP, or NUMERIC
Change replication must be enabled

Etlworks handles filtering automatically.

How HWM Replication with Non-Databases Sources Works

Flat Sources (CSV, Excel)

Data is streamed from the source record by record
The filter is applied for each record based on the value of the HWM field stored in Etlworks storage.

Nested Sources (JSON, XML)

The entire dataset is extracted from the source into memory
The filter is applied to entire dataset based on the value of the HWM field stored in Etlworks storage.

Conflict Resolution

If the same record is modified in both source and destination:

Source always wins
Destination changes are overwritten

HWM replication does not perform conflict detection.

Troubleshooting

HWM Was Reset Unexpectedly

HWM is reset if:

Source name changes
Destination name changes
Transformation order changes
Transformation is enabled or disabled

Reason

HWM is stored using the transformation name and index.

Solutions

Non-wildcard transformations:

Set a fixed transformation name

Wildcard transformations:

Use {source} or {destination}

Best solution:

Calculate HWM on the fly

Flow Fails on First Run with HWM Query

Reason:

Destination table does not exist yet

Solution:

Ignore HWM query exception

Recommendations

For reliable HWM replication:

Always set a stable transformation name
Prefer calculating HWM from destination state
Use Save metrics only on success for nested flows
Treat HWM as state, not magic

Articles in this section

Overview

What Problem HWM Replication Solves

Key Concepts You Must Understand

1. HWM Is Tracked Per Transformation

2. A Valid High Watermark Field Is Required

3. HWM Is Updated Only After a Successful Run

The High Watermark Field

What Is a High Watermark Field

Supported Data Types

Recommended Characteristics

Using Fully Qualified Field Names

Using Multiple Fields as a High Watermark

How HWM Replication Works Internally

First Successful Run

Subsequent Runs

How Filtering Is Applied

Database source

File or API source

Configuring HWM Change Replication

Step 1. Create the Flow

Step 2. Enable Change Replication

Step 3. Optional but Recommended: Calculate HWM on the Fly

Step 4. Handle First-Run Edge Case

Step 5. Configure Destination Action and Lookup Fields (Relational Databases Only)

Applying Changes to a Database Destination (Insert and Update)

What HWM Replication Supports

Choosing the Correct SQL Action

Default Behavior (Insert Only)

Where to Configure the Action

Recommended Actions for HWM Replication

Option 1 (Preferred): MERGE (Upsert)

Option 2: BULK MERGE and BULK DELETE/INSERT

Option 3: IfExist (Insert or Update)

Option 4: CUSTOM (Custom Upsert Template)

Option 5: UPDATE Only

Conditional Action

What This Means for HWM Replication

Lookup Fields (How Etlworks Identifies Existing Records)

What Lookup Fields Represent

Simple Lookup Field Configuration

Predict Lookup Fields (Automatic Detection)

Advanced Lookup Fields Configuration (Multiple Tables or Wildcards)

Lookup Fields and HWM Replication

Calculating High Watermark Value on the Fly Using SQL

Why This Matters

How It Works

Using the {TABLE} Token

Ignoring Exceptions When Executing HWM Query

The Edge Case

Solution

Calculating HWM Value Using JavaScript

When to Use JavaScript

Rules

Available Variables

Integrating High Watermark Logic into Custom Source Queries

Why This Section Exists

The Core Rule

Where the Previous HWM Value Comes From

Bind Variables vs String Substitution

Use Bind Variables for High Watermark Values

Handling the First Run Correctly (NULL HWM)

Example: Adding HWM to an Existing Source Query

Original Source Query (No HWM)

Updated Source Query with HWM Logic

Using HWM in Queries with Derived Fields

Using the Previous HWM Value in HWM Calculation Queries

Common Mistakes to Avoid

Key Takeaway

Manually Setting the High Watermark Value

High Watermark Field Value on First Run

High Watermark Field Value

HWM Replication with Wildcard Sources

Critical Rule

Configuration

Source Queries in Wildcard Transformations

Other Notes

Saving HWM Only on Full Success

The Problem

Solution