This article covers transformations where a file is the destination. Etlworks extracts data from any supported source — database, API, social, NoSQL, … — transforms it, and writes the result to a file.
For end-to-end ETL setup, see Extract, Transform, and Load (ETL) data — destination is a file.
How the destination filename is calculated
Destination filename = connection Directory + transformation TO field.
+
= Inprogress/test.csv
Global variables in the destination filename
Any part of TO can contain a global variable in {token} syntax — replaced at runtime by the variable's value. Full details: parameterization of transformations.
Destination filename same as source filename
When processing files by wildcard, set TO to *. Each output file gets the same base name as its source.
Calculate the destination filename with replace(regex, replacement)
Use replace(regex, replacement) anywhere inside the TO field. The system runs a regex replace on the source filename and concatenates the result with the rest of TO.
Example 1 — substring replace:
- FROM = *_processing.xml
- TO = replace(_processing, _done)
Source files order1_processing.xml, order2_processing.xml, order3_processing.xml → outputs order1_done.xml, order2_done.xml, order3_done.xml.
Example 2 — replace spaces with underscores:
- FROM = *.xml
- TO = replace(' ', _)
Source files order1 processing.xml, order2 processing.xml, order3 processing.xml → outputs order1_processing.xml, order2_processing.xml, order3_processing.xml.
Create files with unique names
Set Add Suffix When Creating Files in Transformation on the destination connection to automatically append a suffix to every file created via a source-to-destination transformation. For unique suffixes in file operations (copy, move, rename, zip), use the equivalent setting there.
| Suffix type | Format | Example (patient.csv becomes…) |
|---|---|---|
| timestamp | MMddyyyyhhmmss | patient_03092017040523.csv |
| uuid | Globally Unique Identifier | patient_f2f0fdb6-0132-11e7-93ae-92361f002671.csv |
| timestampdatefirst | ddMMyyyyHHmmss | patient_09032017040523.csv |
| timestampyearfirst | yyyyMMddHHmmss | patient_20170309040523.csv |
| timestampyeardatefirst | yyyyddMMHHmmss | patient_20170903040523.csv |
Once enabled, every file written via the connection gets the suffix.
Destination filename as a variable
When chaining source-to-destination transformations — where a file produced by step N is the source for step N+1 — you often can't hardcode the filename (wildcards, unique-name suffixes). Etlworks stores the actual destination filename in flow-scoped global variables every time it writes a file.
If the transformation name is *.CSV TO PIPE.JSON 1 and the destination file is /user/local/temp/pipe_4467ea69.json, three variables are created:
| Variable | Value (for the example above) |
|---|---|
| {transformation}.FULL.FILE.NAME.TO.WRITE | /user/local/temp/pipe_4467ea69.json |
| {transformation}.FILE.NAME.TO.WRITE | pipe_4467ea69.json |
| {transformation}.BASE.FILE.NAME.TO.WRITE | pipe_4467ea69 |
The transformation name = the Transformation Name. Click View Flow XML to see it — it's the name tag under destination.
Reference any of these {variables} in connection parameters, FROM/TO fields, or scripts.
Access the destination filename in JavaScript
var destinationFileName = com.toolsverse.config.SystemConfig.instance()
.getProperties().get('{transformation}.FULL.FILE.NAME.TO.WRITE');