Overview
In Etlworks, there are several ways to process files in a folder by a wildcard filename.
Process
Configure Flow to process files by a wildcard filename
Use the following technique if you want to extract, transform, and load multiple files matching the wildcard mask:
Step 1. Create a Connection for the source files. Make sure the Connection supports wildcard filenames.
Step 2. Create a destination Connection.
Step 3. Create a Flow that reads a file and loads it into the destination. When creating a source-to-destination transformation, enter the wildcard filename into the FROM
field.
Step 4. Click MAPPING
and select the Parameters
tab. Make sure that Process all files
is enabled (note that it is enabled by default).
Step 5. Optionally, enter a comma-separated list of the files to exclude and/or files to include.
Delete processed files
Loaded files will remain in the source folder after processing unless you enable parameter Delete loaded source files
under the MAPPING
/ Parameters
. You can also enable Delete source files on error
, which will cause the Flow to delete files in the sources folder if there was an error during the load.
Configure destination name (TO)
Default behavior
The destination name (TO
) can include an asterisk character (*
). By default, the asterisk will be replaced with the actual source name. For example, if the source (FROM
) is *.json
and there are the following JSON files in the source folder:
test1.json
test2.json
test3.json
and the destination (TO
) is configured as public.*
, the following destination tables will be created/updated:
public.test1
public.test2
public.test3
Programatically changing destination name (TO
)
It is possible to programmatically change the destination name (TO
) using either a regular expression or JavaScript (Python is not supported at this time).
In both cases, the input string is the actual source name (see example above).
In the Flow editor, go to MAPPING
and select the Parameters
tab. Enter regular expression or JavaScript in the Calculate Destination Object Name
field:
The system automatically detects if the entered code is a valid JavaScript, and if it is not, assumes that it is a regular expression.
If you choose to use the regular expressions, the Flow will extract the part of the input string that matches the regular expression.
Example:
- Source (
FROM
):*-*_cdc_stream_*.csv
- Destination (
TO
):*
Calculate Destination Object Name
:-(.*?)_cdc_stream
The system will process all the files that match *-*_cdc_stream_*.csv
wildcard. It will set the destination name to the part of the string between -
and _cdc_stream
.
The JavaScript can be used to transform the input variable name
into the destination name (TO
). The last evaluated expression in the JavaScript is used to replace the asterisk (*
) in TO
.
Example:
- Source (
FROM
):*-*_cdc_stream_*.csv
- Destination (
TO
):*
Calculate Destination Object Name
:
var start = name.indexOf('-');
var end = name.indexOf('_cdc_stream');
value = name.substring(start + 1, end);
The system will process all the files that match *-*_cdc_stream_*.csv
wildcard. It will set the destination name (TO
) to the value returned by the JavaScript.
Process files in the specific order
When processing the files by a wildcard filename, for example, *.csv
, the Flow first captures the list of files to process, then sorts the list. The list is sorted using the selected algorithm for the source Connection.
Available options:
Disabled
: default sorting for the Connection, mostly likely by filename with ascending orderoldest
: oldest files firstnewest
: newest files firstascending
: by filename with ascending orderdescending
: by filename with descending orderlargest
: largest files firstsmallest
: smallest files first
Comments
0 comments
Please sign in to leave a comment.