This article covers transformations where a file (or files matching a wildcard) is the source. The transformation extracts the file's contents, optionally applies a format, transforms rows, and loads them into the destination.
For end-to-end ETL setup, see Extract, Transform, and Load (ETL) data — source is a file.
How the source filename is calculated
The actual file Etlworks reads depends on two settings on the connection and on whether you used a wildcard in FROM:
- Enable Wildcard File Name — whether wildcards are allowed at the connection level.
- Override Wildcard File Name in Transformation — whether the transformation's FROM field overrides the connection's filename. On by default.
The combination produces the actual filename:
| Connection filename | FROM in transformation | Override Wildcard On? | Actual file path |
|---|---|---|---|
| *.csv | test.csv | Off | /dir/test.csv |
| *.csv | *test*.csv | Off | /dir/*.csv (connection wins) |
| abc.csv | test.csv | Off | /dir/test.csv |
| *.csv | test.csv | On | /dir/test.csv |
| *.csv | *test*.csv | On | /dir/*test*.csv (transformation wins) |
| abc.csv | test.csv | On | /dir/test.csv |
In every case, the connection's Directory is prepended.
Work with wildcard filenames
Etlworks can process files that match a wildcard pattern, e.g. *.csv.
By default, Etlworks processes one file at a time. If several files match the wildcard, the connection's Enable Wildcard File Name algorithm decides which one. See Process all matching files for the multi-file pattern.
- On the connection, pick an algorithm for Enable Wildcard File Name:
- oldest / newest — sorted by modification time.
- ascending / descending — sorted alphabetically by filename.
- largest / smallest — sorted by file size.
- In the transformation, set FROM to the wildcard pattern (e.g. Inbound*.csv).
Process all files in a folder
For processing every file matching a wildcard in a single flow run, see Process files in an ETL flow using wildcard filenames in Get started with files.
Source filename as a variable
Each time Etlworks reads a source file, it stores the filename in flow-scoped global variables. Useful when working with wildcards or generated filenames where you can't hardcode the name.
The variables are named after the transformation. If the transformation name is *.CSV TO PIPE.JSON 1 and the actual file is /user/local/temp/pipe.csv, three variables are created:
| Variable | Value (for the example above) |
|---|---|
| {transformation}.FULL.FILE.NAME.TO.READ | /user/local/temp/pipe.csv |
| {transformation}.FILE.NAME.TO.READ | pipe.csv |
| {transformation}.BASE.FILE.NAME.TO.READ | pipe |
The transformation name = the Transformation Name. Click View Flow XML on the flow to see it; it's the name tag under source.
Reference any of these {variables} in connection parameters, FROM/TO fields, scripts, etc.
Access the source filename in JavaScript
From any script that has access to the dataSet object:
com.toolsverse.config.SystemConfig.instance().getProperties().put(
'my-key',
com.toolsverse.util.FilenameUtils.getName(dataSet.getFileNameToRead())
);
Read it back elsewhere:
var sourceFileName = com.toolsverse.config.SystemConfig.instance()
.getProperties().get('my-key');