When to use loops
You can create a pipeline (nested Flow) where some of the steps (inner Flows) are configured to be executed in a loop.
Typical use cases for executing Flows in a loop:
- Extract data from the HTTP endpoints with parameters. The parameters can be provided at runtime by using the loop.
- Configure the transformations and Connections by reading configuration parameters from the database.
- Dynamically setting up the credentials for the Connections.
- Executing Flow a fixed number of times.
Also, read about executing Flows conditionally.
Set loop parameters
Step 1. Create a nested Flow and add steps (inner Flows).
Step 2. Select the Flow to be executed in a loop and go to Flow parameters.
Step 3. Enter loop parameters.
Execute a Flow while JavaScript expression returns not null
Configure JavaScript loop
Step 1. Create a nested Flow and add steps (inner Flows).
Step 2. Select the Flow to be executed in a loop and go to Flow parameters.
Step 3. Specify the Loop Type
as JavaScript
.
Step 4. Specify the JavaScript
loop conditions
.
Step 5. Optionally, configure the maximum number of the Loop Threads
to use when executing the Flow in a loop. If the number is greater than one, each iteration of the loop will be executed in its own thread.
Loop Script
The Flow continues to run in a loop while the JavaScript loop conditions
returns a not null
value.
Example:
// this flow will be executed while the value of the "NEXT_MARKER" property is not null
var nextCursorMark = com.toolsverse.config.SystemConfig.instance().getProperties().
get("NEXT_MARKER");
value = !com.toolsverse.util.Utils.isNothing(nextCursorMark) ? nextCursorMark : null;
Available variables
The following variables can be referenced by name from JavaScript and Python code:
Name | Class name / JavaDoc | Packages |
---|---|---|
etlConfig | com.toolsverse.etl.core.config.EtlConfig | com.toolsverse.etl.core.config |
scenario | com.toolsverse.etl.core.engine.Scenario | com.toolsverse.etl.core.engine |
evals | 1-based number of loop iterations | |
loopKeyValues |
Map<String, String> - global and flow variables set for each loop iteration |
Execute the Flow a fixed number of times
Step 1. Configure a JavaScript loop.
Step 2. Enter the following Loop Script
(example):
// this flow will be executed 40 times
value = evals <= 40 ? evals : null;
Read records from a database and execute the Flow for each record
This technique is also called a database loop. The idea is that you can write a SELECT SQL
statement which, when executed, creates a cursor driving the loop. The Flow in a database loop is executed as many times as the number of records returned by the cursor. For every field in each record in a cursor, the system automatically sets the global and Flow variables.
Configure database loop
Step 1. Create a nested Flow and add steps (inner Flows).
Step 2. Select the Flow to be executed in a loop and go to Flow parameters.
Step 3. Specify the loop type as SQL.
Step 4. Specify the database Connection
used for the SQL loop.
Step 5. Enter the SELECT SQL
statement to drive the loop in the Loop Script
field. Optionally enable Execute as
Script
if the loop SQL is in fact script, such as PL/SQL, TSql, etc. Note that script must return a cursor.
Step 6. Optionally, configure the maximum number of the Loop Threads
to use when executing the Flow in a loop. If the number is greater than one, each iteration of the loop will be executed in its own thread.
Example of the database loop
Let's assume that there is a role table in the database that contains information about roles and permissions. In this example, we are going to be creating CSV files for each role in a database.
Step 1. Create a database Connection that will be used to execute the driving (loop of) SQL.
Step 2. Create any Flow, for example, a Flow that reads data from a database and creates files, or a Flow that executes a web service. We're assuming that you want to create a loop that executes this Flow for each record in the result set.
Step 3. Create a nested Flow and add the Flow created in Step 2.
Step 4. Click the pen
icon, next to the Flow name. The Parameters
pop-up window will open.
Step 5. Select SQL
for Loop Type
.
Step 6. Select the Connection created in Step 1 for the Connection.
Step 7. Enter the driving SQL, for example, select roleid from role
as the Loop Script
.
Step 8. Modify the Flow created in Step 2 to use fields from the result set. Enclose the field name in curly brackets: {field name}
. For example: {roleid}
.
You can use the {field name}
in:
- the
FROM
andTO
fields for a transformation
- an HTTP Connection URL (or any other Connection property)
- anywhere in the source or destination query
When referencing {parameters}
, make sure you are using the same CaSe
for the field name as returned by the database. For example, some databases enforce UPPERCASE
for column names returned by executing the SQL query: select roleid from role --> ROLEID
, while other databases keep the original: select roleid from role --> roleid
.
Debug database loop
Read how to debug a database loop.
Process all files in a folder using loop
There is an easy way to process all files in a folder.
Use a loop if want to execute more complex scenarios than just Extract-data-from-source-transform-and-load-in-destination.
Configure loop for this example
Step 1. Create a nested Flow and add steps (inner Flows).
Step 2. Select the Flow to be executed in a loop and go to Flow parameters.
Step 3. Specify the Loop Type
as Files by wildcard
.
Step 4. Specify the file Connection
used for the loop.
Step 5. Enter the wildcard filename to drive the loop in the File path
field.
Step 6. Optionally, configure the maximum number of the Loop Threads
to use when executing the Flow in a loop. If the number is greater than one, each iteration of the loop will be executed in its own thread.
Example of the wildcard filename loop
Step 1. Create a Connection for the source files. Make sure the Connection supports wildcard filenames. Store the name of the Connection.
Step 2. Create a Flow where the source Connection is the Connection created in Step 1 and the destination is whatever you like.
Step 3. Create a nested Flow and add the Flow created in Step 2.
Step 4. Click the pen
icon next to the Flow name and the Parameters
pop-up window will open.
Step 5. Select Files by wildcard
for Loop Type
.
Step 6. Select the Connection created in Step 1 for Connection
.
Step 7. Enter a wildcard filename, such as *.csv
, for Files path
.
Files will remain in the source folder after processing, so you might want to delete or move them to a different folder.
Global and Flow variables available in the file loop
The following global and Flow variables are available in the file loop:
- loop_file_name: the name of the file processed by the loop. The name includes the extension but does not include the path.
- loop_base_file_name: the base name of the file processed by the loop. The name does not include the extension and the path.
- evals: 0-based loop iteration.
Retry loop iteration in case of error
When configuring a loop, you can set parameters for retrying a loop iteration in case of any error:
Loop retries
: number of retries for each loop iteration in case of error.Retries delay
: number of milliseconds to wait between retries.
Stop the loop programmatically
The loop is running until there are records in the result set returned by the loop query, or until there are files to process, or until the JavaScript condition returns a not null
value. In some cases, you may want to stop the loop early, based on conditions.
Solution:
Anywhere in the JavaScript or Python code, executed within the loop (for example in After Extract
transformation) add the following line of code. This call is thread-safe.
etlConfig.setRequestStop(true);
Example:
if (dataSet.getRecordCount() == 0) {
etlConfig.log("Empty dataset, hence exiting the loop");
etlConfig.setRequestStop(true);
}
Comments
0 comments
Please sign in to leave a comment.