File as a transformation destination – Etlworks Support

This article covers transformations where a file is the destination. Etlworks extracts data from any supported source — database, API, social, NoSQL, … — transforms it, and writes the result to a file.

For end-to-end ETL setup, see Extract, Transform, and Load (ETL) data — destination is a file.

How the destination filename is calculated

Destination filename = connection Directory + transformation TO field.

Connection Directory

Transformation TO

= Inprogress/test.csv

Global variables in the destination filename

Any part of TO can contain a global variable in {token} syntax — replaced at runtime by the variable's value. Full details: parameterization of transformations.

Destination filename same as source filename

When processing files by wildcard, set TO to *. Each output file gets the same base name as its source.

TO = *

Calculate the destination filename with replace(regex, replacement)

Use replace(regex, replacement) anywhere inside the TO field. The system runs a regex replace on the source filename and concatenates the result with the rest of TO.

Example 1 — substring replace:

FROM = *_processing.xml
TO = replace(_processing, _done)

Source files order1_processing.xml, order2_processing.xml, order3_processing.xml → outputs order1_done.xml, order2_done.xml, order3_done.xml.

Example 2 — replace spaces with underscores:

FROM = *.xml
TO = replace(' ', _)

Source files order1 processing.xml, order2 processing.xml, order3 processing.xml → outputs order1_processing.xml, order2_processing.xml, order3_processing.xml.

Create files with unique names

Set Add Suffix When Creating Files in Transformation on the destination connection to automatically append a suffix to every file created via a source-to-destination transformation. For unique suffixes in file operations (copy, move, rename, zip), use the equivalent setting there.

Suffix type	Format	Example (patient.csv becomes…)
timestamp	MMddyyyyhhmmss	patient_03092017040523.csv
uuid	Globally Unique Identifier	patient_f2f0fdb6-0132-11e7-93ae-92361f002671.csv
timestampdatefirst	ddMMyyyyHHmmss	patient_09032017040523.csv
timestampyearfirst	yyyyMMddHHmmss	patient_20170309040523.csv
timestampyeardatefirst	yyyyddMMHHmmss	patient_20170903040523.csv

Once enabled, every file written via the connection gets the suffix.

Suffix options

Destination filename as a variable

When chaining source-to-destination transformations — where a file produced by step N is the source for step N+1 — you often can't hardcode the filename (wildcards, unique-name suffixes). Etlworks stores the actual destination filename in flow-scoped global variables every time it writes a file.

If the transformation name is *.CSV TO PIPE.JSON 1 and the destination file is /user/local/temp/pipe_4467ea69.json, three variables are created:

Variable	Value (for the example above)
{transformation}.FULL.FILE.NAME.TO.WRITE	/user/local/temp/pipe_4467ea69.json
{transformation}.FILE.NAME.TO.WRITE	pipe_4467ea69.json
{transformation}.BASE.FILE.NAME.TO.WRITE	pipe_4467ea69

The transformation name = the Transformation Name. Click View Flow XML to see it — it's the name tag under destination.

View Flow XML

Destination name in XML

Reference any of these {variables} in connection parameters, FROM/TO fields, or scripts.

Variable in TO

Access the destination filename in JavaScript

var destinationFileName = com.toolsverse.config.SystemConfig.instance()
    .getProperties().get('{transformation}.FULL.FILE.NAME.TO.WRITE');

Articles in this section