- Startup
- Business
- Enterprise
- On-Premise
- Add-on
Overview
The flow inspection service is designed to analyze and identify potential performance bottlenecks and structural issues within executed flows. This service generates comprehensive reports in JSON format, stored in the {app.data}/errors
directory, which can be used to identify critical, major, minor, and informational issues that could impact performance or integrity. These reports are especially useful when dealing with complex nested flows, as they provide detailed insight into every part of the flow, including nested components.
How it works
When a flow is executed, the inspection service evaluates its structure and performance based on predefined criteria. It checks for inefficiencies, such as disabling batch processing, improper use of streaming, or issues with database connections. The service categorizes the detected issues into different severity levels:
-
Critical
: Critical issue with immediate impact on performance or structure -
Major
: Major issue that significantly affects performance or structure -
Minor
: Minor issue with limited impact on performance or structure -
Info
: Informational finding and recommendations with no impact on performance or structure
For nested flows, the inspection includes every sub-flow, ensuring that no inefficiencies are missed. The generated reports contain detailed descriptions of the detected issues, an explanation of the root cause, and suggestions for remediation.
Example of the report
{
"severity": "CRITICAL",
"flowIssues": [
{
"flowId": "613",
"flowName": "Flow with source query",
"severity": "CRITICAL",
"issues": [
{
"severity": "CRITICAL",
"issueType": "PERFORMANCE",
"description": "There is a Source query for the transformation which extracts from a flat dataset",
"why": "Using a Source query disables streaming, causing the transformation to load the entire source dataset into memory, which severely impacts performance",
"suggestion": "Remove the Source query",
"transformationName": "CATEGORY.CSV TO CATEGORY 1",
"taskName": null,
"columnName": null
},
{
"severity": "INFO",
"issueType": "STRUCTURE",
"description": "In the ETL transformation where the file is a source, the option \"Ignore when there is no file\" is disabled",
"why": "When this option is disabled and the file does not exist, the flow will fail with an error",
"suggestion": "Enable \"Ignore when there is no file\"",
"transformationName": "CATEGORY.CSV TO CATEGORY 1",
"taskName": null,
"columnName": null
}
]
}
]}
Configuration
The configuration for the inspection service can be accessed through the Settings|Flows
section of the UI. Users can choose to only report issues of a specific severity level (e.g., Major, Critical) or not report any issues at all. The default value is Do not report
.
To enable the service set Minimum Severity Level to Report
to the value other than Do not report
.
How to use
Once configured, the inspection service automatically runs during the execution of flows. The user simply needs to execute the flow as normal, and the service will inspect the flow, generating a report if any issues are found.
These reports are available in the {app.data}/errors
folder, each file is named using the following naming convention: flowId_flowName_seveiry_timestamp.json
.
Here is an example: 616_Main_flow_to_report_issues_critical_20240821110000180.json
.
The reports are generated for the main account and each individual tenant. Each tenant has a dedicated /errors folder where their specific reports are stored.
By reviewing these reports, users can identify potential issues and take corrective actions based on the suggestions provided. The service can also be configured to report issues only above a certain severity level, helping prioritize critical performance and structural improvements.
Step-by-step instruction
Step 1. Create Server storage connection which points to the {app.data}/errors
folder.
Under main account:
Under tenant:
Step 2. Navigate to errors folder in Explorer to check if any reports were generated and to browse the issue. Filter by the severity of the issue (which is a part of the file name) if needed:
Step 3. Select the report file and drill down to detailed report:
Delete old reports
Reports are generated for each executed flow if the service detects any issues at or above the configured severity level. Over time, this can result in thousands of reports, making navigation and browsing challenging. We recommend creating a flow that automatically deletes older reports matching the pattern flowId_flowName_*.json
, retaining only the latest report for each flow to improve manageability.
For example if there are 3 reports: 616_Main_flow_to_report_issues_critical_20240822110000180.json
, 616_Main_flow_to_report_issues_critical_20240821110000180.json
and 616_Main_flow_to_report_issues_critical_20240820110000180.json,
The flow will keep only 616_Main_flow_to_report_issues_critical_20240822110000180.json
and delete the other two.
Here is a step-by-step instruction for creating a flow which deletes old reports:
Step 1. Create scripting flow.
Step 2. Add the following Script:
var alias = com.toolsverse.etl.core.task.common.
FileManagerTask.getAlias(etlConfig, 'Errors');
var folder = com.toolsverse.util.FilenameUtils.getFullPath(alias.getUrl());
var File = Java.type("java.io.File");
var FilenameFilter = Java.type("java.io.FilenameFilter");
var Files = Java.type("java.nio.file.Files");
var Paths = Java.type("java.nio.file.Paths");
var Comparator = Java.type("java.util.Comparator");
var Collectors = Java.type("java.util.stream.Collectors");
function deleteOldFiles(directory) {
etlConfig.log("Will delete old reports in " + directory);
var dir = new File(directory);
if (!dir.exists() || !dir.isDirectory()) {
etlConfig.log("Directory does not exist or is not a directory: " + directory);
return;
}
var filter = new FilenameFilter(function(dir, name) {
return name.matches("\\d+_.+_(critical|major|minor|info|CRITICAL|MAJOR|MINOR|INFO)(:.+)?_\\d{17}\\.json");
});
var files = dir.listFiles(filter);
if (files == null || files.length === 0) {
etlConfig.log("No files matching the pattern found.");
return;
}
// Group files by the base name (without timestamp)
var filesGroupedByBaseName = java.util.Arrays.stream(files)
.collect(Collectors.groupingBy(function(file) {
return file.getName().replaceAll("_\\d{17}\\.json$", "");
}));
// Iterate over each group and delete all but the most recent file
filesGroupedByBaseName.forEach(function(baseName, fileList) {
if (fileList.size() > 1) {
var sortedFiles = fileList.stream()
.sorted(Comparator.comparingLong(function(file) {
return file.lastModified();
}).reversed()).collect(Collectors.toList());
// Keep the most recent file, delete the others
for (var i = 1; i < sortedFiles.size(); i++) {
var fileToDelete = sortedFiles.get(i);
Files.delete(Paths.get(fileToDelete.getAbsolutePath()));
etlConfig.log("Deleted: " + fileToDelete.getName());
}
}
});
}
deleteOldFiles(folder);
Step 3. Add named connection which points to /errors folder.
Step 4. Schedule this flow to run periodically. We recommend running it at least once a day. However, if you have hundreds of flow executions per day, running it every hour may be more effective for managing the volume of reports.
Comments
0 comments
Please sign in to leave a comment.