Overview
Etlworks collects and aggregates usage, execution, and scheduling data across all Flows.
This data is available in the Insights section.
Insights helps you:
- Monitor system activity in real time
- Analyze execution patterns and performance
- Identify bottlenecks and gaps in schedules
- Troubleshoot issues using execution-level visibility
- Optimize workload distribution
Only Admin and SuperAdmin users can access Insights.
Insights
The Insights menu (previously called Statistics) provides two tabs:
- Statistics – high-level usage and account metrics
- Dashboards – detailed operational views for executions, tasks, and schedules
Multi-tenant view (SuperAdmin)
SuperAdmin users can switch to multi-tenant mode.
In this mode:
- Metrics are aggregated across selected tenants
- Dashboards reflect combined activity
- Each metric can be expanded to see per-tenant breakdown
Why this is useful:
- Monitor usage across all tenants
- Compare activity between tenants
- Identify heavy usage or anomalies
Statistics
The Statistics tab provides a summary of system usage and activity.
You can see:
- Total records processed
- Total executions (scheduled, on-demand, listener)
- Daily and monthly trends
- Execution success vs error distribution
- Total number of Flows, Connections, Schedules, Formats, and Listeners
Use the month selector to view historical data.
This view is useful for:
- Tracking overall usage
- Monitoring growth trends
- Identifying spikes in processing volume
Overage monitoring
Insights tracks usage against subscription limits.
Displays:
- Daily usage vs limit Monthly usage vs limit Potential overage
Notifications:
- Daily overage notice
- Monthly overage notice
Sent to the account owner when limits are exceeded.
Dashboards
The Dashboards tab provides detailed, purpose-built views for monitoring and analysis.
Each dashboard focuses on a specific aspect of system activity.
Executions
The Executions dashboard shows all flow executions for a selected day.
Key capabilities:
- Real-time view of running and completed executions
- Filter by: Name, Status, Records, Duration, Start and End time
- Partial name matching supported
- Combine filters for precise results
Click any execution to:
- View status and errors
- Open logs
- Inspect execution details
Export data:
- Download filtered executions as CSV
- Useful for reporting and external analysis
Why this is useful:
- Troubleshoot failures quickly
- Identify slow or large executions
- Audit system activity
Running Flow Tasks
Shows currently running tasks across all flows.
Includes:
- Task type (extract, load, transform, etc.)
- Flow name
- Owner
- Execution details
Why this is useful:
- Real-time visibility into what is running
- Detect stuck or long-running tasks
- Understand pipeline activity at step level
Executions Heatmap
Visualizes execution activity over time.
Features:
- Heatmap by hour, day, week, or month
- Color intensity reflects execution volume
- Shows: Average executions, Records processed, Average duration, Peak activity window
Click any cell to drill down into executions.
Why this is useful:
- Identify busy time windows
- Detect uneven workload distribution
- Plan scheduling and scaling
Scheduled Activity
Analyzes schedules and predicts workload.
Includes multiple views:
Expected Timeline
- Simulated execution timeline based on schedules
- Shows projected runs and durations
Gaps
- Identifies available time windows with no executions
- Helps find capacity for new workloads
Scheduled vs Actual
- Compares planned executions vs actual runs
- Highlights deviations
Ask (AI-assisted)
- Query schedules using natural language
- Example: show me the gaps between midnight and noon
Why this is useful:
- Optimize schedule distribution
- Avoid overlaps and bottlenecks
- Identify unused capacity
- Improve overall system efficiency
Resource utilization
Tracks system-level resources across the cluster: CPU, memory, disk, JVM threads, and file descriptors.
Available to SuperAdmin users.
Tracked metrics:
- CPU — system load average and processor count
- Memory — total, used, and free RAM; JVM heap; thread count and peak
- Disk — OS disk and data disk free / total space
- File descriptors — open count and max allowed
- Running flows count and sampled thread roots
Per-cluster-node breakdown for multi-node deployments.
Includes two views:
Overview
- Per-node summary cards with current status (normal, warning, critical)
- Time-series charts for CPU, RAM, disk, threads, and file descriptors
- Top flows by estimated resource consumption, with CPU time, memory average and max, and a confidence level
- Per-execution thread samples for deep diagnostics
Automatic incident tracking
The Resource Usage dashboard now opens an incident when a tracked resource (CPU, RAM, or disk) breaches a configured threshold and stays above it for a configured sustain duration. Incidents close automatically when the resource drops back below the threshold for a configurable settle period. Each incident captures:
- Resource — CPU, RAM, or disk.
- Severity — Info, Minor, Medium, Major, or Critical (driven by warning vs. critical threshold breaches).
- Opened at / Resolved at — ISO timestamps.
- Trigger value — the percentage at which the threshold fired.
- Sustained — how long the resource was above threshold.
- Flow impact — the flows running on the affected node during the incident.
- Sample count — monitoring samples observed during the incident.
Thresholds and sustain / settle durations are configured per resource. Defaults:
| Resource | Warning | Critical | Sustain to open | Sustain to close |
|---|---|---|---|---|
| CPU | 70% | 90% | 30 s (warning), 60 s (critical) | 120 s |
| RAM | 75% | 90% | 30 s (warning), 60 s (critical) | 120 s |
| Disk | 80% | 95% | 60 s (warning), 60 s (critical) | 300 s |
Override these on the Configure tab of the Resource Usage dashboard. Settings apply per-node; threshold breaches on different nodes are tracked independently.
Incidents are listed in a table beneath the resource-usage timeline. Click an incident to see its flow impact and the time-series context around the open / close events. Incidents are persisted to Postgres — the table survives restarts and is queryable through the API at GET /rest/v1/metrics/resource-usage/incidents.
Ask (AI-assisted)
- Query resource-utilization data in natural language
- Example: which flows used the most CPU between 9am and noon yesterday?
Controls:
- Date range picker with from / to and time granularity
- Node selector to filter by cluster node
- Configurable warning and critical thresholds per resource (CPU, RAM, disk)
- Capture intervals and raw-data retention
- Alert email recipients
Webhook event: a resource-usage webhook fires when a tracked resource crosses a configured warning or critical threshold. The payload includes the resource key, node, current value, the previous and current levels (normal, warning, critical), and the state transition.
Why this is useful:
- Detect node pressure before it impacts flows
- Correlate slow executions with resource contention
- Find which flows are the heaviest consumers
- Right-size capacity for the workload
AI usage
Tracks Simba activity, token consumption, billing, and knowledge-base sync.
Available to Admin and SuperAdmin users on tenants where AI features are enabled. SuperAdmins see an additional cross-tenant breakdown.
Tracked metrics (current month, resets at month start):
- Total tokens — prompt, completion, total
- Agentic vs non-agentic tokens — distinguishes state-changing tool calls from Q&A and knowledge-base search
- Conversations and total requests
- Daily usage chart for the current month
Billing visibility:
- Total cost, billable cost, and non-billable cost; agentic-only cost
- Wallet balance, monthly drain, and auto-recharge status
- Monthly cap with current usage percentage
- Tenants on BYOK (their own OpenAI key) show zero billable cost — their OpenAI account is billed directly
Per-tenant breakdown (SuperAdmin only):
- All metrics aggregated by tenant
- Tenant display name, environment label, and billing flags
RAG (knowledge base) status:
- Total indexed chunks across sources (knowledge-base articles, CLI commands, templates, marketing pages)
- Last sync time, duration, and status (success, failed, never)
- Next scheduled sync
- Trigger a full or incremental reindex from the dashboard, with progress polling
Why this is useful:
- Monitor monthly AI spend against the allowance and cap
- See which tenants and users are driving consumption
- Confirm the knowledge base is current — last sync, source coverage, scheduled refresh
- Trigger a reindex after major documentation or template changes
Flow Findings dashboard
The Flow Findings dashboard aggregates automatic flow-inspection reports across your instance. Whenever a flow runs, the engine writes an inspection report capturing observed issues — schema drift, performance anomalies, recurrent errors, data quality concerns. The dashboard reads those reports and surfaces the flows that need attention.
What's a finding?
A finding is a flow-level inspection report with one or more detected issues. Each report has a severity (ALL_GOOD, INFO, MINOR, MEDIUM, MAJOR, CRITICAL) computed from the worst issue in the report. Each issue has:
- Issue type — e.g., schema drift, slow step, recurrent error, data quality.
- Severity.
- Description — what was detected.
- Why — why it's flagged.
- Suggestion — what to do about it.
What the dashboard shows
The dashboard lists every flow that has an inspection report within the configured retention window (1–36 months). For each flow you see:
- Flow name, ID, and severity badge.
- Issue count.
- Last executed timestamp.
- Report last-updated timestamp.
- An indicator if the flow still exists (reports outlive deleted flows).
Click a flow to expand its report and see the per-issue detail.
Filters
| Filter | What it does |
|---|---|
| Severity | Show only findings at or above the selected severity. |
| Flow | Text search by name or ID. |
| Tenant | Multi-tenant filter, super-admin only. |
| Report Age | Sliding window 1–36 months. Older reports are not shown. |
Where the data lives
Inspection reports are written to {app.data}/errors/<tenant>/<flow-id>_<timestamp>.json by the flow runtime. The dashboard reads the files directly and aggregates them at query time — there's no separate findings table. Old reports are pruned per the retention policy.
API: GET /rest/v1/metrics/flow-findings?tenants=<csv>&months=<1-36>.