File as a transformation source – Etlworks Support

This article covers transformations where a file (or files matching a wildcard) is the source. The transformation extracts the file's contents, optionally applies a format, transforms rows, and loads them into the destination.

For end-to-end ETL setup, see Extract, Transform, and Load (ETL) data — source is a file.

How the source filename is calculated

The actual file Etlworks reads depends on two settings on the connection and on whether you used a wildcard in FROM:

Enable Wildcard File Name — whether wildcards are allowed at the connection level.
Override Wildcard File Name in Transformation — whether the transformation's FROM field overrides the connection's filename. On by default.

Connection settings

The combination produces the actual filename:

Connection filename	FROM in transformation	Override Wildcard On?	Actual file path
*.csv	test.csv	Off	/dir/test.csv
*.csv	test.csv	Off	*/dir/.csv** (connection wins)
abc.csv	test.csv	Off	/dir/test.csv
*.csv	test.csv	On	/dir/test.csv
*.csv	test.csv	On	*/dir/test.csv* (transformation wins)
abc.csv	test.csv	On	/dir/test.csv

In every case, the connection's Directory is prepended.

Work with wildcard filenames

Etlworks can process files that match a wildcard pattern, e.g. *.csv.

By default, Etlworks processes one file at a time. If several files match the wildcard, the connection's Enable Wildcard File Name algorithm decides which one. See Process all matching files for the multi-file pattern.

On the connection, pick an algorithm for Enable Wildcard File Name:
- oldest / newest — sorted by modification time.
- ascending / descending — sorted alphabetically by filename.
- largest / smallest — sorted by file size.
In the transformation, set FROM to the wildcard pattern (e.g. Inbound*.csv).

Process all files in a folder

For processing every file matching a wildcard in a single flow run, see Process files in an ETL flow using wildcard filenames in Get started with files.

Source filename as a variable

Each time Etlworks reads a source file, it stores the filename in flow-scoped global variables. Useful when working with wildcards or generated filenames where you can't hardcode the name.

The variables are named after the transformation. If the transformation name is *.CSV TO PIPE.JSON 1 and the actual file is /user/local/temp/pipe.csv, three variables are created:

Variable	Value (for the example above)
{transformation}.FULL.FILE.NAME.TO.READ	/user/local/temp/pipe.csv
{transformation}.FILE.NAME.TO.READ	pipe.csv
{transformation}.BASE.FILE.NAME.TO.READ	pipe

The transformation name = the Transformation Name. Click View Flow XML on the flow to see it; it's the name tag under source.

View Flow XML

Source filename in XML

Reference any of these {variables} in connection parameters, FROM/TO fields, scripts, etc.

Variable in FROM

Access the source filename in JavaScript

From any script that has access to the dataSet object:

com.toolsverse.config.SystemConfig.instance().getProperties().put(
    'my-key',
    com.toolsverse.util.FilenameUtils.getName(dataSet.getFileNameToRead())
);

Read it back elsewhere:

var sourceFileName = com.toolsverse.config.SystemConfig.instance()
    .getProperties().get('my-key');

Articles in this section

How the source filename is calculated

Work with wildcard filenames

Process all files in a folder

Source filename as a variable

Access the source filename in JavaScript

Related articles