Overview
This technique allows configuring the ETL Flow (which has multiple source-to-destination transformations or a single transformation configured with a wildcard source) to retry failed or not yet executed steps (transformations) on restart.
Example
The database-to-database Flow is configured to extract data from multiple tables which names match the wildcard name, for example, public.*
, and load it into multiple destination tables, like in this how-to article.
The Flow has been running for hours, extracting and loading data from hundreds of tables, some of them having hundreds of millions of records before it failed because the source database server closed the Connection.
Before it failed, the Flow successfully extracted and loaded data for most of the source-to-destination pairs, but a few pairs have not started yet, and a few pairs failed.
How it works
A Flow configured to retry failed or not yet executed steps on restart (manually or by the scheduler) will skip the successfully executed, retry failed, and execute not yet executed steps.
Here are the rules:
- If the Flow has never been executed, it will execute all steps (transformations).
- If the Flow has been executed successfully, it will execute all steps on the next run.
- If some of the steps were not executed yet, the Flow will execute these steps on the next run.
- On the next run, the Flow will execute failed steps and skip successfully executed steps.
Process
Here's how you can retry failed or not yet executed steps:
Step 1. Create ETL Flow.
Step 2. If the Flow loads data into the relational database, enable auto-commit
for the destination database Connection.
Alternatively, you can enable Create new database connection for destination
for the source-to-destination transformation.
Step 3. Select Parameters
tab for the Flow and enable Retry Failed Transformations
.
Transformation Status
The specific source-to-destination transformation can be executed successfully, skipped, or executed with an error.
To check the status of each transformation, open the Flow status dashboard.
Metrics
display Records Metrics
for the specific source-to-destination transformation, such as the number of records extracted and loaded as well as the status of the transformation:
Comments
0 comments
Please sign in to leave a comment.