Overview
Etlworks can extract data from files, transform it, and load it into any destination, for example, a database, files, or a third-party system via the API, etc.
Read how to configure a transformation when the source is a file.
Source filename
In a FROM
- TO
transformation, Etlworks calculates the source (FROM
) file name based on two factors: whether wildcard file names are allowed (the default), and whether Override Wildcard File Name in Transformation
is enabled (enabled by default).
Both can be set when configuring a Connection:
Here are some examples that allow wildcard file names:
Case 1. Override Wildcard File Name
is disabled.
-
If the
CONNECTION
is configured using a wildcard file name, and theFROM
in theFROM
-TO
transformation is not a wildcard file name, the actual file name will beConnection directory
+FROM file name
. For example, if the Connection directory is/usr/local/temp
, the Connection file(s) is*.csv
, and theFROM
istest.csv
, then the actual file name will be/usr/local/temp/test.csv
. -
If the
CONNECTION
is configured using a wildcard file name, and theFROM
in theFROM
-TO
transformation is also a wildcard file name, the actual file name will beconnection directory
+connection file name
. For example, If the Connection directory is/usr/local/temp
, the Connection file(s) is*.csv
, and theFROM
is*test*.csv
, then the actual file name will be/usr/local/temp/*.csv
. -
If the
CONNECTION
is configured using a non-wildcard file name, and theFROM
in theFROM
-TO
transformation is also a non-wildcard file name, the actual file name will beconnection directory
+FROM file name
. For example, if the Connection directory is/usr/local/temp
, the Connection file(s) isabc.csv
, and theFROM
istest.csv
, then the actual file name will be/usr/local/temp/test.csv
.
Case 2. Override Wildcard File Name
is enabled.
- If the
CONNECTION
is configured using a wildcard file name, and theFROM
in theFROM
-TO
transformation is not a wildcard file name, the actual file name will beconnection directory
+FROM file name
. For example, If the Connection directory is/usr/local/temp
, the Connection file(s) is*.csv
, and theFROM
istest.csv
, then the actual file name will be/usr/local/temp/test.csv
.
This is different from Case 1.
- If the
CONNECTION
is configured using a wildcard file name, and theFROM
in theFROM
-TO
transformation is also a wildcard file name, the actual file name will beconnection directory
+FROM file name
. For example, if the Connection directory is/usr/local/temp
, the Connection file(s) is*.csv
, and theFROM
is*test*.csv
, then the actual file name will be/usr/local/temp/*test*.csv
. -
If the
CONNECTION
is configured using a non-wildcard file name, and theFROM
in theFROM
-TO
transformation is also a non-wildcard file name, the actual file name will beconnection directory
+FROM file name
. For example, if the Connection directory is/usr/local/temp
, the Connection file(s) isabc.csv
, andFROM
istest.csv
, then the actual file name will be/usr/local/temp/test.csv
.
Work with wildcard file names
Etlworks can process files that match a wildcard file name, for example, *.csv
.
By default, Etlworks processes only one file at a time. For example, if there are multiple files that match the wildcard *.xml
, and oldest
has been selected as the algorithm for wildcard file processing, Etlworks will choose to process the oldest file in the folder which matches the wildcard. Read how to process all files that match a wildcard filename.
Step 1. When creating a Connection, select one of the available algorithms for wildcard file processing using the field Enable Wildcard File Name
:
- oldest: files in the folder are sorted from oldest to newest, then the oldest modified file is selected for processing.
- newest: files in the folder are sorted from newest to oldest, then the newest modified file is selected for processing.
- ascending: files in the folder are sorted alphabetically in ascending order, then the first file is selected for processing.
- descending: files in the folder are sorted alphabetically in descending order, then the first file is selected for processing.
- largest: files in the folder are sorted by size, then the largest file is selected for processing.
- smallest: files in the folder are sorted by size, then the smallest file is selected for processing.
Step 2. Create a Flow where the source (FROM
) is a file and the destination (TO
) is a database, another file, or a web service. When creating the transformation, enter a wildcard filename, for example Inbound*.csv
, into the FROM
field.
Process all files in a folder
Read how to configure the Flow to process all files in a folder.
Source file name as a variable
It is fairly common to store the source file name that the Flow needs to access in a variable, for example, by placing it within the database. Etlworks can work with wildcard file names, such as *.csv
, since it is not always possible to hardcode the source file name, so storing it as a variable overcomes this ambiguity.
Access source file name using global variables
Each time Etlworks reads a source file, it stores its name as a global variable. The global variable is only available to that particular Flow, and all Flows within that nested Flow.
Let's assume that the source name is *.CSV TO PIPE.JSON 1
.
Tip:
source name = Transformation Name. You can always check the source name by clicking View Flow XML
for that particular Flow:
And by checking the actual source name, which is the value of the tag name
below the tag source
:
Let's also assume that the actual source file name is /user/local/temp/pipe.csv
.
So, for the source name *.CSV TO PIPE.JSON 1
and for the actual source file name /user/local/temp/pipe.csv
, three global variables will be created:
SOURCE NAME.FULL.FILE.NAME.TO.READ: *.CSV TO PIPE.JSON 1.FULL.FILE.NAME.TO.READ
= /user/local/temp/pipe.csv
SOURCE NAME.FILE.NAME.TO.READ: *.CSV TO PIPE.JSON 1.FILE.NAME.TO.READ
= pipe.csv
SOURCE NAME.BASE.FILE.NAME.TO.READ: *.CSV TO PIPE.JSON 1.BASE.FILE.NAME.TO.READ
= pipe
You can then reference any of these {variables}
in the Connection parameters (such as URL, etc.), or in the FROM
/ TO
fields for the transformation.
You can also access any of the global variables above using the following JavaScript:
var sourceFileName = SystemConfig.instance().getProperties().
get('variable name above');
Access the source file name using JavaScript
The simplest way to access it is to use a dataSet.getFileNameToRead()
if the dataSet
object is available in that particular JavaScript code. For example:
SystemConfig.instance().getProperties().
put('my unique key for the source file name',
FilenameUtils.getName(dataSet.getFileNameToRead()));
And then access it somewhere else:
var sourceFileName = SystemConfig.instance().getProperties().
get('my unique key for the source file name');
Comments
0 comments
Please sign in to leave a comment.