Analyzing Flows for Performance Optimization and Structural Integrity – Etlworks Support

Overview

Etlworks' flow inspector identifies performance bottlenecks, structural problems, and misconfigurations in flows before they cause failures or degradation in production. Findings are surfaced in three different places, and the right one depends on what you are trying to do:

The Inspect Flow button in the flow editor — the fastest way to see whether the flow you are currently working on has issues. Runs on demand, updates automatically as you edit, and shows severity + count directly on the button. Recommended during flow development and testing. See Inspecting a flow in the editor.
The Flow Findings dashboard — an organization-wide view that aggregates findings from every flow run across your instance. Recommended for operators reviewing the overall health of the account. See Flow Findings dashboard.
Inspection report files written to {app.data}/errors on every flow execution when the reporting threshold is enabled. Useful for offline analysis, audit records, or programmatic post-processing. See Browsing generated reports from the errors folder.

The Inspect Flow button works on demand at any time and needs no configuration. The dashboard and the report files depend on the flow inspector being enabled at a chosen severity threshold — see Enabling inspection reports on flow runs.

What the inspector checks and how findings are classified

When the inspector evaluates a flow, it walks the entire flow structure — including nested sub-flows — and checks for known performance and structural problems: disabled batch processing, misconfigured streaming, database connection issues, schema drift, transformation shapes that will not scale, unusual step configurations, and dozens of similar patterns. For each issue found, the inspector records a severity, a short description, an explanation of why it matters, and a suggested fix.

The severity model is shared by all three surfaces — the button badge, the dashboard, and the report files:

Severity	What it means
Critical	Immediate impact on performance or structure. Address before running in production.
Major	Significant impact on performance or structure. Should be addressed.
Minor	Limited impact. Usually best-practice or optimization opportunities.
Info	Informational recommendation. No impact on performance or correctness; act at your discretion.

Inspecting a flow in the editor

The Inspect Flow button is the primary and recommended way to check a flow. It works on demand, does not require the flow to have executed previously, and provides feedback while you are still editing.

The Inspect Flow button

Open any flow (regular or nested) in the editor. The Inspect Flow button is in the bottom toolbar and is represented by a crosshair icon.

Starting in version 9.6.11 the button is state-aware:

When the flow editor finishes loading, Etlworks automatically runs the inspection in the background and updates only the button — you don't have to click anything to see whether the flow has issues.
The background check runs again whenever the flow changes, including changes to linked connections and formats that can affect the inspection results. Some inspection findings originate from the resources a flow uses (connection settings, format definitions), not just from the flow definition itself.
The check is triggered by editor events, not by timer-based polling — no background load while you are not editing.
If any findings are present, the button shows the total count (for example, Inspect flow (2)) and changes color to match the highest severity found: Critical, Major, Minor, or Info. When there are no findings, the button returns to its neutral state.
The button updates in place; the rest of the editor is not refreshed.
For new or incomplete flows that are not yet ready to be inspected, the button skips the background check until the flow has enough information to inspect. This prevents spurious validation errors while you are still building the flow.

Opening the detailed report

Click the Inspect Flow button at any time to open the inspection results dialog. The dialog contains every issue found across the flow and any nested flows, grouped by severity: Critical, Major, Minor, and Info. For each issue you see:

The name of the affected flow or transformation.
A short description of the problem.
An explanation of why it matters.
A suggestion for how to fix it.

When multiple flows are involved (nested flows), the highest severity level found is displayed at the top of the popup, so the overall impact is immediately clear.

Interpreting the results

Critical and Major findings should be addressed immediately — they can cause runtime errors or serious performance problems. Minor findings usually relate to best practices and optimization opportunities and can be prioritized alongside other cleanup. Info entries are purely informational and can be addressed at your discretion. When no issues are found, the dialog states that the flow passed inspection.

When to use this path

The Inspect Flow button is the right entry point when you are working on a specific flow — during initial development, when investigating a specific problem, or when reviewing a change before shipping it. The rest of this article covers the two other entry points, which serve different needs: the Flow Findings dashboard for an organization-wide view, and the generated report files for offline or programmatic analysis.

Flow Findings dashboard

The Flow Findings dashboard aggregates automatic flow-inspection reports across your instance. Whenever a flow runs, the engine writes an inspection report capturing observed issues — schema drift, performance anomalies, recurrent errors, data quality concerns. The dashboard reads those reports and surfaces the flows that need attention.

What's a finding?

A finding is a flow-level inspection report with one or more detected issues. Each report has a severity (ALL_GOOD, INFO, MINOR, MEDIUM, MAJOR, CRITICAL) computed from the worst issue in the report. Each issue has:

Issue type — e.g., schema drift, slow step, recurrent error, data quality.
Severity.
Description — what was detected.
Why — why it's flagged.
Suggestion — what to do about it.

What the dashboard shows

The dashboard lists every flow that has an inspection report within the configured retention window (1–36 months). For each flow you see:

Flow name, ID, and severity badge.
Issue count.
Last executed timestamp.
Report last-updated timestamp.
An indicator if the flow still exists (reports outlive deleted flows).

Click a flow to expand its report and see the per-issue detail.

Filters

Filter	What it does
Severity	Show only findings at or above the selected severity.
Flow	Text search by name or ID.
Tenant	Multi-tenant filter, super-admin only.
Report Age	Sliding window 1–36 months. Older reports are not shown.

When to use the dashboard vs. the Inspect Flow button

You want to…	Use
Check the flow you are currently editing	Inspect Flow button
See findings update automatically while you edit	Inspect Flow button
See every flow in the org that has any finding	Flow Findings dashboard
Filter findings by severity, tenant, or age across the whole instance	Flow Findings dashboard
Query findings programmatically	Dashboard's API (see the linked Insights article)
Store historical inspection artifacts for audit or offline analysis	Generated report files

The dashboard is populated only for flows whose runs produce inspection reports. Reports are written when the reporting threshold is set — see the next section.

Enabling inspection reports on flow runs

Both the Flow Findings dashboard and the generated report files depend on the same underlying mechanism: when a flow runs, the inspection service checks it and, if any finding is at or above the configured threshold, writes a JSON report to {app.data}/errors. The default threshold is Do not report, which means no reports are written until you enable it explicitly.

Configuration:

Open Settings → Flows.
Set Minimum Severity Level to Report to a value other than Do not report. Common choices:
- Info — write a report for every flow with any finding (largest volume).
- Minor — skip pure info findings; capture everything else.
- Major — capture only findings that need attention.
- Critical — capture only findings that block correct operation.

Note. Reports are generated for the main account and for each individual tenant. Each tenant has its own {app.data}/errors/<tenant>/ subdirectory where its reports are stored.

Once reports are being generated, the Flow Findings dashboard picks them up automatically — no further configuration is required for the dashboard. The offline browsing workflow described in the next section requires one additional step: a Server Storage connection pointed at the errors folder.

Browsing generated reports from the errors folder (advanced)

This path is useful when you want to inspect the raw JSON reports directly — for example, when building custom reporting or auditing outside the Etlworks UI, when correlating inspection findings with external logs, or when reviewing historical reports for a specific incident. For most day-to-day use the Inspect Flow button and the Flow Findings dashboard are simpler and cover the same information.

Set up a Server Storage connection to the errors folder

Step 1. Create a Server Storage connection pointing at {app.data}/errors. In a multi-tenant deployment, create one connection per tenant, each pointing at the tenant's errors subdirectory.

Under the main account:

Under a tenant:

Step 2. Open Explorer, navigate to the connection, and browse the reports. Each report's filename encodes the flow ID, flow name, severity, and timestamp, so you can filter by severity by matching on the filename:

flowId_flowName_severity_timestamp.json

Example: 616_Main_flow_to_report_issues_critical_20240821110000180.json.

Step 3. Select a report file and drill down into the detailed content:

Report file structure

Each report is a JSON document containing the flow-level severity, a list of affected sub-flows (for nested pipelines), and per-issue details. Example:

{
   "severity": "CRITICAL",
   "flowIssues": [
      {
        "flowId": "613",
        "flowName": "Flow with source query",
        "severity": "CRITICAL",
        "issues": [
          {
              "severity": "CRITICAL",
              "issueType": "PERFORMANCE",
              "description": "There is a Source query for the transformation which extracts from a flat dataset",
              "why": "Using a Source query disables streaming, causing the transformation to load the entire source dataset into memory, which severely impacts performance",
              "suggestion": "Remove the Source query",
              "transformationName": "CATEGORY.CSV TO CATEGORY 1",
              "taskName": null,
              "columnName": null
          },
          {
              "severity": "INFO",
              "issueType": "STRUCTURE",
              "description": "In the ETL transformation where the file is a source, the option \"Ignore when there is no file\" is disabled",
              "why": "When this option is disabled and the file does not exist, the flow will fail with an error",
              "suggestion": "Enable \"Ignore when there is no file\"",
              "transformationName": "CATEGORY.CSV TO CATEGORY 1",
              "taskName": null,
              "columnName": null
          }
       ]
   }
]}

Deleting old reports

Reports are written for every flow execution that produces a finding at or above the configured threshold. Over time, an active instance can accumulate thousands of reports and Explorer navigation gets slow. The recommended approach is to schedule a small housekeeping flow that keeps only the most recent report per flow + severity.

Retention pattern. For each set of reports with the same base name (flow ID + flow name + severity), keep the newest and delete the older ones. Example — given three reports:

616_Main_flow_to_report_issues_critical_20240822110000180.json
616_Main_flow_to_report_issues_critical_20240821110000180.json
616_Main_flow_to_report_issues_critical_20240820110000180.json

the housekeeping flow keeps 616_Main_flow_to_report_issues_critical_20240822110000180.json and deletes the other two.

Step 1. Create a scripting flow.

Step 2. Add the following script:

var alias = com.toolsverse.etl.core.task.common.
FileManagerTask.getAlias(etlConfig, 'Errors');

var folder = com.toolsverse.util.FilenameUtils.getFullPath(alias.getUrl());

var File = Java.type("java.io.File");
var FilenameFilter = Java.type("java.io.FilenameFilter");
var Files = Java.type("java.nio.file.Files");
var Paths = Java.type("java.nio.file.Paths");
var Comparator = Java.type("java.util.Comparator");
var Collectors = Java.type("java.util.stream.Collectors");

function deleteOldFiles(directory) {
   etlConfig.log("Will delete old reports in " + directory);

   var dir = new File(directory);
   if (!dir.exists() || !dir.isDirectory()) {
      etlConfig.log("Directory does not exist or is not a directory: " + directory);
      return;
    }

    var filter = new FilenameFilter(function(dir, name) {
      return name.matches("\\d+_.+_(critical|major|minor|info|CRITICAL|MAJOR|MINOR|INFO)(:.+)?_\\d{17}\\.json");
    });

    var files = dir.listFiles(filter);

    if (files == null || files.length === 0) {
      etlConfig.log("No files matching the pattern found.");
      return;
    }

    // Group files by the base name (without timestamp)
    var filesGroupedByBaseName = java.util.Arrays.stream(files)
      .collect(Collectors.groupingBy(function(file) {
     return file.getName().replaceAll("_\\d{17}\\.json$", "");
    }));

    // Iterate over each group and delete all but the most recent file
    filesGroupedByBaseName.forEach(function(baseName, fileList) {
    if (fileList.size() > 1) {
      var sortedFiles = fileList.stream()
       .sorted(Comparator.comparingLong(function(file) {
         return file.lastModified();
     }).reversed()).collect(Collectors.toList());

     // Keep the most recent file, delete the others
     for (var i = 1; i < sortedFiles.size(); i++) {
       var fileToDelete = sortedFiles.get(i);
       Files.delete(Paths.get(fileToDelete.getAbsolutePath()));
       etlConfig.log("Deleted: " + fileToDelete.getName());
     }
    }
});
}

deleteOldFiles(folder);

Step 3. Add a named connection called Errors that points at the /errors folder.

Step 4. Schedule this flow to run periodically. Once a day is enough for most instances; if you have hundreds of flow executions per day, once an hour is more effective.

Choosing the right approach

Situation	Recommended path
You are building or editing a specific flow and want to know whether it has issues	Inspect Flow button
You want the check to update automatically as you edit	Inspect Flow button (updates on flow, connection, and format changes)
You want to see every flow in your account that has issues	Flow Findings dashboard
You want to filter findings by severity, tenant, or report age across the instance	Flow Findings dashboard
You need programmatic access to findings (status page, custom report, external tool)	Dashboard API — see the Insights article
You want to review raw report JSONs offline or for audit purposes	Generated reports in the errors folder
You want an inspection to run automatically on every flow execution	Enable inspection reports at the desired severity threshold in Settings → Flows

Articles in this section