This technique allows configuring the ETL Flow (which has multiple source-to-destination transformations or a single transformation configured with a wildcard source) to retry failed or not yet executed steps (transformations) on restart.
The database-to-database Flow is configured to extract data from multiple tables which names match the wildcard name, for example,
public.*, and load it into multiple destination tables, like in this how-to article.
The Flow has been running for hours, extracting and loading data from hundreds of tables, some of them having hundreds of millions of records before it failed because the source database server closed the Connection.
Before it failed, the Flow successfully extracted and loaded data for most of the source-to-destination pairs, but a few pairs have not started yet, and a few pairs failed.
How it works
A Flow configured to retry failed or not yet executed steps on restart (manually or by the scheduler) will skip the successfully executed, retry failed, and execute not yet executed steps.
Here are the rules:
- If the Flow has never been executed, it will execute all steps (transformations).
- If the Flow has been executed successfully, it will execute all steps on the next run.
- If some of the steps were not executed yet, the Flow will execute these steps on the next run.
- On the next run, the Flow will execute failed steps and skip successfully executed steps.
Here's how you can retry failed or not yet executed steps:
Step 1. Create ETL Flow.
Step 2. If the Flow loads data into the relational database, enable
auto-commit for the destination database Connection.
Alternatively, you can enable
Create new database connection for destination for the source-to-destination transformation.
Step 3. Select
Parameters tab for the Flow and enable
Retry Failed Transformations.
The specific source-to-destination transformation can be executed successfully, skipped, or executed with an error.
To check the status of each transformation, open the Flow status dashboard.
Records Metrics for the specific source-to-destination transformation, such as the number of records extracted and loaded as well as the status of the transformation: