Output-side and data-quality operations — control how data lands at the destination, filter out duplicates, validate rows, or keep the dataset in memory for downstream steps. Configure each one under Transformation → MAPPING.
Partition-by
Split a large dataset into smaller output files. Available only when the destination is a file.
Two modes, picked by the value type in the Partition By field:
Partition by record count. Enter a numeric value — the maximum number of records per file. With Partition By = 100 and 1,000 source records, you get 10 files of 100 records each. Filenames: original_name + _ + index + original_extension.
Partition by field values. Enter a comma-separated list of column names. Etlworks creates one file per unique combination of those columns. With Partition By = last_name,first_name, every unique (last, first) pair gets its own file. Filenames: original_name + _ + partition_value + original_extension.
Ignore the original filename. Default filename pattern includes the source filename, e.g. order_1234.csv. To get clean per-partition filenames without that prefix, enable MAPPING → Complex Transformations → Ignore Original File Name.
Configure: MAPPING → Complex Transformations → Partition.
Remove duplicates
Drop subsequent records that match an earlier record on a configurable set of fields. Etlworks compares each incoming row to what it has already processed and ignores matches.
Configure: MAPPING → Complex Transformations → Remove Duplicates. Enter the comma-separated list of fields that define a duplicate.
Validation
Define rules that reject a row, reject the entire dataset, or halt the flow when source data fails a check — e.g., required-field missing, value out of range, type mismatch.
Configure: MAPPING → Additional Transformations → Validation. Specify the rule set and the failure action (reject row / reject dataset / halt flow).
Memory Connection as destination
A Memory Connection holds an entire dataset in RAM during flow execution. Almost any source-to-destination transformation can use a Memory Connection as its destination — the transformation extracts from the source and stores the result in memory, where other transformations in the same flow (or nested flows) can read it.
When to use it:
- Lookups and enrichment against a small or medium reference dataset.
- Caching reference data once and reusing it across multiple transformations.
- Parsing a web service response before writing it elsewhere.
- Calculations spanning multiple transformations.
- Passing datasets between a parent flow and nested flows.
- Avoiding temporary staging tables for short-lived data.
Constraint. The dataset must fit in RAM. Use staging tables instead for larger datasets.
Setup:
- Create a connection of type Memory Connection.
- Create or open a flow.
- Add a source-to-destination transformation. Source = any supported source (database, file, API, …). Destination = the Memory Connection from step 1.
- Name the transformation — the name is how downstream transformations reference the dataset.
- Downstream transformations in the same flow can use the Memory Connection as their source.