Overview
Etlworks provides multiple ways to process files in a folder using wildcard filenames. This allows processing multiple files matching a pattern while controlling selection, processing order, and destination handling.
This article specifically covers processing files by a wildcard in ETL transformations where the source is a file or files.
For other options, see:
- Process files using file operations (copy, move, etc.)
- File loop
-
Process files in bulk load flows (Snowflake, Redshift, etc.)
Configuring a Flow to Process Files by a Wildcard Filename
Use this method to extract, transform, and load multiple files that match a wildcard pattern.
Step 1. Create a Source Connection
Ensure the source Connection supports wildcard filenames.
Step 2. Create a Destination Connection
Step 3. Create a Flow that reads a file and loads it into the destination.
Step 4: Define the Transformation
When creating a source-to-destination transformation enter the wildcard filename (e.g., *.csv) in the FROM field.
Step 5. Optionally Exclude or Include Specific Files
Optionally, enter a comma-separated list of files to exclude or files to include.
Handling Processed Files
By default, after processing, files remain in the source folder. However, you can configure the Flow to either delete, move, or mark files as processed.
Delete processed files
Enable Delete loaded source files under MAPPING > Parameters.
You can also enable Delete source files on error to remove files if an error occurs during processing.
Skip Already Processed Files
Instead of deleting files, the Flow can mark files as processed and keep them in the folder. This prevents reprocessing without physically removing files.
To enable this feature, use the following UI controls under MAPPING > Parameters:
• Skip Previously Processed Files – Ensures only new or modified files are processed.
• File Retention in Cache (ms) – Controls how long processed file information is stored in cache.
• Custom Cache File Name – Specifies a custom file for storing processed file records.
Move Processed Files
Instead of deleting or marking files, you can move them to another location.
To configure this:
• Go to Flow > Connections tab and set Move processed files to.
• This moves processed files to a different location instead of leaving them in the source folder.
• This option is ignored if Delete loaded source files or mark files as processed is enabled.
Processing Priority of These Options
If multiple options are enabled, they are applied in the following order:
1. Delete loaded source files → Always takes priority and removes the file.
2. Mark files as processed → If deletion is disabled, files are retained but skipped in future runs.
3. Move processed files → If neither deletion nor marking is enabled, files are moved.
Configure destination name (TO)
Default behavior
The TO field can include an asterisk (*). The Flow automatically replaces * with the source filename (without the extension).
Example:
• FROM: *.json
• TO: public.*
If the folder contains:
• test1.json, test2.json, test3.json
The system will create/update:
• public.test1, public.test2, public.test3
Programmatically changing destination name (TO
)
It is possible to modify the destination name dynamically using Regular Expressions or JavaScript.
In both cases, the input string is the actual source name (see example above).
In the Flow editor, go to MAPPING
and select the Parameters
tab. Enter Regular Expression or JavaScript in the Calculate Destination Object Name
field:
The system automatically detects if the entered code is a valid JavaScript, and if it is not, assumes that it is a regular expression.
If you choose to use the regular expressions, the Flow will extract the part of the input string that matches the regular expression.
Example:
- Source (
FROM
):*-*_cdc_stream_*.csv
- Destination (
TO
):*
-
Calculate Destination Object Name
:-(.*?)_cdc_stream
The system will process all the files that match *-*_cdc_stream_*.csv
wildcard. It will set the destination name to the part of the string between -
and _cdc_stream
.
The JavaScript can be used to transform the input variable name
into the destination name (TO
). The last evaluated expression in the JavaScript is used to replace the asterisk (*
) in TO
.
Example:
- Source (
FROM
):*-*_cdc_stream_*.csv
- Destination (
TO
):*
Calculate Destination Object Name
:
var start = name.indexOf('-');
var end = name.indexOf('_cdc_stream');
value = name.substring(start + 1, end);
The system will process all the files that match *-*_cdc_stream_*.csv
wildcard. It will set the destination name (TO
) to the value returned by the JavaScript.
Process files in the specific order
When using a wildcard filename (e.g., *.csv), the Flow:
1. Captures the list of matching files.
2. Sorts the files using the selected File processing order from the source Connection.
Available options:
-
Disabled
: default sorting for the Connection, mostly likely by filename with ascending order -
oldest
: oldest files first -
newest
: newest files first -
ascending
: by filename with ascending order -
descending
: by filename with descending order -
largest
: largest files first -
smallest
: smallest files first
Comments
0 comments
Please sign in to leave a comment.