About parallel loops
In Etlworks Integrator it is possible to create a pipeline of flows (nested flow). The steps in a pipeline (inner flows) can be executed:
- Sequentially (one by one).
- In the parallel threads (out of order).
- Conditionally (if the condition is true).
- In the loop.
When executing inner flows in a loop it is possible to configure a parallel execution by setting the Loop Threads parameter to greater than 1.
It makes sense to configure parallel loops for the following use cases:
- Making a large number of the HTTP requests.
- Iterating through the pages of the paginated API.
- Loading multiple files in the database.
- All other cases when you need to perform multi-stage processing of the data using repeatable steps.
Configuring parallel loop
To configure a parallel loop set the value of the parameter Loop Threads to the greater than 1.
The parameter controls the maximum number of parallel threads which can be created to execute all iterations of the loop.
How parallel loop works
For example, using loop conditions above, let's assume that there are 100 roles in the system and that the configured database loop will be executing flow 100 times - for each record, returned by the SQL statement
select role_id, tole_name from roles.
With a Loop Threads set to 10, the system will create a so-called thread pull with a maximum number of threads set to 10. It will create an actual loop, iterating through all the records in the cursor and it will assign a task for each ireration. It will submit each task to the thread pool, which will assign an available thread to the task.
- If there is an available thread - it will start immidiately, essentially executing the inner flow in the parallel thread.
- If there are no available threads (remember, we set a limit to 10) the thread pool will wait.
Parametrization in the parallel loop
Now, it is important to understand what exactly is happening when the inner flow is executed in the multiple parallel threads.
- The task to execute the flow is created and added to the thread pool.
- When the database loop is used system automatically sets the global and flow variables. The automatically set global variables are thread-safe, meaning they can be used by multiple parallel threads at the same time. A good example would be a flow which gets the
user_idfrom the SQL driving the database loop and uses it to call the HTTP endpoint with a dymamic URL set as
var props = SystemConfig.instance().getContextProperties();