Performance tips when working with files and APIs – Etlworks Support

Efficient handling of files and APIs is critical for high-performance data integration. Below are best practices for improving performance in Etlworks when working with files, APIs, and database destinations.

Use Streaming When Possible

Streaming allows Etlworks to process data record by record, minimizing memory usage.

To enable streaming:

Go to Mapping → Parameters
Enable the option Stream Data

Streaming is not available when

The source or destination is a nested structure (e.g., complex JSON or XML)
You are using nested mappings or after-extract transformations

When streaming is not available automatically, consider using Force Streaming.

Use Force Streaming When Regular Streaming Is Not Available

In some cases, streaming is automatically disabled even if it is explicitly enabled, such as when:

The source is a non-flat file (e.g., JSON or XML)
The source is a web service response with filtering SQL queries

To improve performance in these scenarios:

Go to Mapping → Parameters
Enable the option Force Streaming

How Force Streaming Works

The system still loads the full dataset into memory
Records are then processed one at a time
SQL statements are executed per record, not accumulated
Bind variables and batching (if enabled) can still be used

This approach reduces memory usage and improves performance when working with large or complex datasets, especially when writing to relational databases.

Read about Streaming vs Force Streaming

Split Large Files

Processing large files, especially large XML documents, can be slow and memory-intensive. A better approach is to:

Split the large file into smaller, manageable chunks
Process each chunk in a loop

Learn more: How to split large files

Use Server Storage When Possible

It is generally faster to process files stored in server storage than remote files or cloud storage.

Recommended Approach:

Step 1: Create a flow that copies or moves files to server storage.
Step 2: Create a second flow that processes the files from server storage.
Step 3: Create a nested flow that runs Steps 1 and 2 in sequence.

This setup ensures faster I/O operations and improves the reliability of file-based integrations.

Use Parallel Loops for Large Batches

When executing a large number of identical operations (e.g., file processing or API calls), parallel loops can significantly reduce execution time.

Learn more: Using parallel loops

Enable Parallel File Operations with Wildcard File Names

If you are processing multiple files using a wildcard file name, you can improve performance by enabling parallel file operations. This allows multiple files to be processed at the same time.

Learn more: Enabling parallel file operations

Articles in this section