ETL threads info API – Etlworks Support

Startup
Business
Enterprise
On-Premise
Add-on

Overview

This API can be helpful for monitoring the resource consumption (CPU and memory) of ETL processes across nodes and identifying performance bottlenecks in specific flows.

This API provides a snapshot of the system’s state at the time it is called. It captures information about all currently running ETL threads and child threads, including their CPU and memory usage. However, this data represents only the system’s state at the moment the API is invoked.

Important Note: This API does not continuously monitor or aggregate metrics over a long period. Each API call provides a one-time snapshot of the running threads, and ongoing or historical metrics will require separate requests or monitoring mechanisms.

This distinction is important if you want to monitor system performance over time, in which case you’d need to either call the API periodically or use additional tools for long-term monitoring and aggregation of metrics.

Authentication

Only super-admin user can use this API.

Before making a call to the built-in API, the user must receive a JWT token from the authentication API.

Step 1. Use any user with the super-admin role to call an Etlworks authentication endpoint and receive an access token.

Step 2. Use the access token received in Step 1 to call the Etlworks API endpoints. The access token must be submitted as a header parameter, as in: Authorization:Bearer access-token.

Note: Access tokens in Etlworks are short-lived and self-expiring. An access token gives you access to the APIs for approximately 10 minutes. It is recommended that you refresh the access token before each call to the API.

The API endpoint parameters

PATH: /rest/v1/system/metrics/etl-threads{?optional_query_parameters}

EXAMPLE: https://etlworks/rest/v1/system/metrics/etl-threads

METHOD: GET
HEADER: Authorization:Bearer access-token
REQUEST BODY: none
REQUEST CONTENT TYPE: none

Optional Query parameters

flows (optional): A comma-separated list of flow IDs (string) to filter specific flows.

Example: flows=5236,5237

allNodes (optional, default = false): A flag to indicate whether to collect thread data from all nodes or only the current node.

Example: allNodes=true

Usage Example

To get detailed information about all running ETL threads from all nodes, use the following query:

GET /rest/v1/system/metrics/etl-threads?allNodes=true

If you only want information about specific flows (e.g., flow IDs 5236 and 5237):

GET /rest/v1/system/metrics/etl-threads?flows=5236,5237

Example response

[
   {
     "flowId": 5236,
     "requestId": "97409",
     "name": "pool-15-thread-1-main-etl-thread 5236:97409",
     "id": 318,
     "memoryUsage": 2849849,
     "cpuTime": 9586000,
     "node": "https://node1.etlworks.com",
     "aggregatedMemoryUsage": 2849849,
     "aggregatedCpuTime": 9586000,
     "children": null
   },
   {
     "flowId": 5237,
     "requestId": "97407",
     "name": "pool-11-thread-1-main-etl-thread 5237:97407",
     "id": 258,
     "memoryUsage": 23162683,
     "cpuTime": 77912000,
     "node": "https://node2.etlworks.com",
     "aggregatedMemoryUsage": 37938436,
     "aggregatedCpuTime": 127613000,
     "children": [
       {
          "flowId": 5237,
          "requestId": "97407",
          "name": "etl-parallel-97407-debezium-mysqlconnector-localhost3306-change-event-source-coordinator",
          "id": 267,
          "memoryUsage": 12903700,
          "cpuTime": 43404000,
          "node": "https://node2.etlworks.com",
          "aggregatedMemoryUsage": 12903700,
          "aggregatedCpuTime": 43404000,
          "children": null
       },
       {
          "flowId": 5237,
          "requestId": "97407",
          "name": "etl-parallel-97407-debezium-mysqlconnector-localhost3306-SignalProcessor",
          "id": 268,
          "memoryUsage": 1872053,
          "cpuTime": 6297000,
          "node": "https://node2.etlworks.com",
          "aggregatedMemoryUsage": 1872053,
          "aggregatedCpuTime": 6297000,
          "children": null
       }]
}]

Response Fields:

flowId: (integer) The ID of the ETL flow associated with this thread.
requestId: (string) A unique request identifier for the ETL process.
name: (string) The name of the thread.
id: (integer) The thread ID.
memoryUsage: (integer) Estimated memory usage of the thread (in bytes).
cpuTime: (integer) CPU time used by the thread (in nanoseconds).
node: (string) The URL of the node where the thread is running.
aggregatedMemoryUsage: (integer) Total memory usage of the thread and its children (in bytes).
aggregatedCpuTime: (integer) Total CPU time of the thread and its children (in nanoseconds).
children: (array) A list of child threads, if any, with the same structure as the parent thread.

Notes

Memory Usage: Displays the estimated memory consumption for each thread, including child threads if they exist. It’s important to note that the JVM operates on a shared memory model, meaning threads within the same process share the available heap space. There is no API that provides precise memory usage for individual threads. As such, this value is an approximation based on the total heap memory used by the JVM and the relative CPU usage of each thread. We derive these estimates using the proportion of CPU time each thread consumes, providing an informed approximation rather than an exact measurement.

CPU Time: CPU time is calculated for each thread, with child threads included in the aggregate totals.

Children: If a thread spawns additional child threads, these will be displayed in the children array, following the same structure as the parent.

Response codes

200 for success
401 and 403 for not authorized
500 for an internal error
504 for not responding (timeout)

Articles in this section