Working with APIs – Etlworks Support

This article covers consuming APIs — calling someone else's REST/SOAP/OData/GraphQL service from Etlworks. To go the other way and expose a flow as a REST endpoint, see Building custom APIs.

Choosing the right connector

Etlworks gives you four ways to talk to an external API. Pick the leftmost option that fits your target system — less configuration is better.

Connector	When to use	Auth
Connected Apps	Salesforce, HubSpot, QuickBooks, NetSuite, ~80 SaaS platforms. Data exposed as relational tables / views / stored procedures. Drag-and-drop or SQL.	Built-in OAuth2 / API key, per-app.
Pre-Configured API Connectors	Popular APIs like Google, Microsoft Graph, Salesforce, HubSpot, QuickBooks, ZOHO, Atlassian. Auto-detects pagination + tokens.	Browser-based OAuth2, one-click.
Generic HTTP Connector	Any REST/SOAP/OData/GraphQL API. Full control over URL, headers, payload, auth, pagination. Build your own custom connector. Worked OAuth2 examples for Xero, Spotify, and DocuSign.	None / Basic / JWT / OAuth2 / API key / custom.
Specialized connectors	Google Analytics, Google Sheets, Google Ads, and similar — APIs with their own dedicated connection type ("well-known APIs").	Provider-specific.

ETL with APIs

The most common reason to integrate with an API is moving data through one — either pulling data out and loading it into a warehouse/database/file, or pushing data from a system of record to a remote API. Etlworks ships a dedicated flow type for each direction.

Direction	What it does	Walkthrough
API → destination	HTTP connection is the source. Destination is a database, warehouse, file, NoSQL store, queue, or another API. Flow types: Web Service to Database / File / NoSQL / Queue / Email / Web Service.	How to extract data from an API and load it into any destination
Source → API	Source is a database, file, warehouse, NoSQL store, or email. Destination is an HTTP connection. Flow types: Database / File / NoSQL / Email to Web Service.	ETL data from any source to API
Connected App &lrarr; anywhere	Salesforce, HubSpot, QuickBooks, etc. handled as JDBC-backed sources/destinations — query with SQL or drag-and-drop, no HTTP plumbing.	How to ETL data from and to a Connected App
Well-known APIs (Google, social)	Specialized connectors for Google Analytics / Sheets / Ads and similar. Single-purpose flow types optimized for these providers.	ETL from selected Google and social media APIs

Call an API endpoint as a flow

If you don't need ETL — you just need a flow whose entire purpose is to call an HTTP endpoint — use the Call HTTP Endpoint flow type. It calls the endpoint, optionally sends a user-defined payload, and does not transform the response.

Use it for: triggering an action on a remote system (start a job, send a notification, fire a webhook), kicking off a downstream pipeline from inside a nested flow, or any orchestration / automation step that isn't data movement.

Full walkthrough: Call HTTP endpoint.

Bulk and pass-through patterns

Four patterns that move data through an API without a full source-to-destination transformation.

Save the HTTP response to a file

Land the raw, unmodified response from a web service into any supported file destination (S3, GCS, Azure Storage, FTP/FTPS/SFTP, Box, Dropbox, Google Drive, OneDrive, SharePoint, WebDAV, server storage, outbound email).

Create an HTTP source connection pointing at the API.
Create the destination connection (any file storage above).
Create a Copy Files flow with the HTTP connection as FROM, the file connection as TO. Specify the filename in either the FROM or TO field.

Send a file's content as the request payload

The mirror — send a file's raw bytes as the HTTP body, no parsing, no transformation.

Create a source connection (any file storage, or HTTP, MongoDB, etc.).
Create a destination HTTP connection pointing at the target endpoint.
Create a Copy Files flow with the file connection as FROM, the HTTP connection as TO. Specify the filename in the FROM field (wildcards supported).

The file's raw content becomes the body of the HTTP request.

Loop a table or files through API calls

One API request per source row or per matching file. Use a database loop or file loop on the flow; map row/file values into the request via payload tokens.

Dynamic URL per call

Each request hits a different URL — per-tenant, per-customer, per-region. Use tokens directly in the URL: https://api.example.com/tenants/{tenant_id}/users. The token resolves from a loop variable, flow parameter, or script.

HTTP connection configuration

Most connection-level settings live on the HTTP connection (URL, method, default headers, timeout). For the full field-by-field reference, see HTTP API Connector. Two configuration choices come up often enough to call out here.

Auto-retry on transient errors

On the HTTP connection, set:

Number of Retries — total retry attempts after a failure. 0 or blank disables retries.
Delay Between Retries — initial delay (ms). The connector uses exponential backoff: the delay doubles each attempt up to 1 minute, then stays at 1 minute for all further attempts.

Note: Non-retriable status codes. These codes stop retries immediately because they almost always indicate a request that won't succeed on retry: 400, 404, 405, 406, 410, 411, 413, 414, 415, 422, 426, 428.

Capture the response code and headers

Enable Save HTTP Code into Global Variable on the HTTP connection — Etlworks stores the response code in the thread-safe global variable http_processor_last_http_code:

var httpCode = com.toolsverse.config.SystemConfig.instance()
    .getContextProperties()
    .get("http_processor_last_http_code");

For headers, enable Save Response Header into Global Variable. The headers land as a key=value string in http_processor_last_response_headers:

var headers = com.toolsverse.config.SystemConfig.instance()
    .getContextProperties()
    .get("http_processor_last_response_headers");
var theHeaders = Utils.getProperties(headers);
for each (var entry in theHeaders.entrySet()) {
    etlConfig.log(entry.getKey() + "=" + entry.getValue());
}
// theHeaders.getProperty("header-name") for a specific header

Important: Thread-local + overwritten on next call. Both values are only valid within the current execution thread and are overwritten by the next HTTP call. Read them immediately after the HTTP operation.

The HTTP connector has built-in support for the five common pagination styles — page-based, offset-based, cursor-based, nextLink, and time-based. Configure each pattern declaratively from the connection settings.

For APIs that don't fit the built-in patterns (custom stop conditions, multi-step pagination, conditional looping), use script-based pagination.

Work with Paginated APIs Using the HTTP Connector — the five built-in patterns, each with a worked example.
Advanced Pagination with Scripts — script-driven pagination for APIs that need full programmatic control.

Authentication

The HTTP connector supports public endpoints, Basic, Bearer/JWT, API key, OAuth2 (client credentials, authorization code, JWT grant), AWS SigV4, and custom auth via scripts. Pre-Configured Connectors handle OAuth2 interactively.

Full reference: Authentication Methods for the HTTP API Connector. Concrete OAuth2 walkthroughs (Xero, Spotify, DocuSign JWT): Examples of the custom API connectors.

Data exchange formats

Etlworks parses responses and builds payloads in any supported structured format:

CSV, Fixed Length Text, JSON, JSON Dataset, XML, XML Dataset, Avro, Parquet, Key-Value, HL7 v2.x, HL7 FHIR.

Attach the format to the HTTP connection (or per transformation) and the connector handles parsing and serialization.

Working with nested documents

APIs often return deeply nested JSON, XML, or EDI. Etlworks can read, flatten, transform, and generate nested data structures — whether you're loading nested data into a flat table or composing structured payloads for the next API call. See the dedicated Nested and Hierarchical Data category for the full toolkit.

Parametrization and dynamic URLs

Use tokens ({token}) in URLs, headers, and payloads. Tokens are replaced at runtime — great for dynamic authentication, per-row endpoints, and request customization.

https://api.example.com/v1/customers/{customer_id}/orders?from={start_date}

Payload templates with tokens

Enter a payload template with {token} placeholders in the connection's Payload field, then resolve the tokens from your flow:

{
  "Device": "{name}",
  "HostName": "{host}",
  "DomainName": "{domain}",
  "FullyQualifiedDomainName": "{domain_full}",
  "Description": "{description}"
}

Tokens can be resolved from any of these sources — pick whichever fits your flow design:

Global variables set in a script (Pre/Post-processor, mapping script, etc.)
Database-loop parameters — one set of values per source row
File-loop parameters — one set per matching file
Flow variables set at the flow level
Variables entered when running a flow manually, or in the schedule configuration
Variables set on an Integration Agent flow configuration
URL or payload parameters when triggering the flow via call-flow-by-name / call-flow-by-id APIs
URL or path parameters from a user-defined API (Building custom APIs)
HTTP preprocessor script on the connection

Advanced response handling

Use a Memory Connection to work with the response

A Memory Connection holds the response in-flight so you can inspect or branch on it without writing to disk.

Pattern A — capture the raw response for later use. Useful for logging or chaining steps that need the unmodified body.

Create a Memory Connection.
Create a source Format matching the response (JSON, XML, CSV, …).

Add a Preprocessor on the source format that captures the raw response:

// store the raw response in a global variable
com.toolsverse.config.SystemConfig.instance()
    .getProperties()
    .put('unique-key', message);
value = message; // return the unmodified response

Create a Web Service to File-style flow: source connection = HTTP, source format = the format from step 2-3, FROM = the endpoint, destination connection = Memory, destination format = same format, TO = memory.

Pattern B — parse the response and branch on it. Useful when one flow drives another and you need to react to the called flow's status. Example: a "caller" flow runs another flow via run-flow-by-name and throws if it failed.

Build a source-to-destination flow whose source is the run-flow-by-name API (call synchronously) and whose destination is a Memory Connection. JSON on both sides.

In MAPPING → Additional Transformations → After Extract, parse the response and throw on bad statuses:

var status = dataSet.getFieldValue(dataSet.getRecord(0), 'status');
var exception = dataSet.getFieldValue(dataSet.getRecord(0), 'exception');

if (status == 5) {
    throw 'Flow finished with an exception: ' + exception;
} else if (status == 0 || status == 1) {
    throw 'Flow is still running but expected to finish already';
} else if (status == 2) {
    throw 'Flow has been canceled';
} else if (status == 3) {
    throw 'Flow has not been executed';
}

Status codes returned by the run-flow API: 0 just started, 1 running, 2 canceled, 3 not executed, 4 success, 5 error.

Test and explore endpoints

You don't need an external HTTP client to exercise an API — the Etlworks Explorer connects directly to an HTTP connection and lets you browse responses, run SQL-like queries, and validate the shape before you build a flow.

Create an HTTP Connection to the API.
Create a Format that matches the response (commonly JSON, XML, or CSV).
Open Etlworks Explorer, select the connection, and link the Format.
From Explorer: browse endpoints and fields as metadata, view rows in grid or raw mode, run SQL-like queries over the response, inspect field relationships.

Troubleshooting

Can't connect. Check the HTTP connection's URL, method, headers, and auth against the provider's docs. Use the Explorer to test — same error a flow would surface, but faster. Common causes and fixes: Cannot Connect to a Web Service.
Auth fails (401/403). See the troubleshooting matrix in Authentication Methods for the HTTP API Connector.
Empty / partial results. The endpoint paginates and pagination isn't configured — see Pagination.
Transient 5xx or rate-limit 429. Enable auto-retry; for higher-volume rate limiting, also use a throttled scheduler.
Need to inspect the raw request/response. See How to debug HTTP requests, or use the Memory Connection pattern.

Articles in this section