In Etlworks, it is possible to create a pipeline of Flows (nested Flow). The steps in a pipeline (inner Flows) can be executed:
- Sequentially (one by one)
- In the parallel threads (out of order)
- Conditionally (if the condition is true)
- In the loop
When executing inner Flows in the loop, it is possible to configure a parallel execution by setting the
Loop Threads parameter to greater than 1.
It makes sense to configure parallel loops for the following use cases:
- Making a large number of HTTP requests.
- Iterating through the pages of the paginated API.
- Loading multiple files in the database.
- All other cases when you need to perform multi-stage processing of the data using repeatable steps.
Configure parallel loop
To configure a parallel loop, set the value of the parameter
Loop Threads to be greater than 1.
The parameter controls the maximum number of parallel threads that can be created to execute all iterations of the loop.
How parallel loop works
Using the loop conditions above, let's assume that there are 100 roles in the system and that the configured database loop will be executing Flows 100 times. For each record, returned by the SQL statement,
select role_id, tole_name from roles.
Loop Threads set to
10, the system will create a so-called thread pull with a maximum number of threads set to
10. It will create an actual loop, iterating through all the records in the cursor, and it will assign a task for each iteration. It will submit each task to the thread pool, which will assign an available thread to the task.
- If there is an available thread: it will start immediately, essentially executing the inner Flow in the parallel thread.
- If there are no available threads (remember, we set a limit to 10): the thread pool will wait.
Parametrization in the parallel loop
It is important to understand exactly what is happening when the inner Flow is executed in multiple parallel threads.
- The task to execute the Flow is created and added to the thread pool.
- When the database loop is used, the system automatically sets the global and Flow variables. The automatically set global variables are thread-safe, meaning multiple parallel threads can use them at the same time. A good example would be a Flow that gets the
user_idfrom the SQL driving the database loop and uses it to call the HTTP endpoint with a dynamic URL set as