Efficient handling of files and APIs is critical for high-performance data integration. Below are best practices for improving performance in Etlworks when working with files, APIs, and database destinations.
Use Streaming When Possible
Streaming allows Etlworks to process data record by record, minimizing memory usage.
To enable streaming:
-
Go to Mapping → Parameters
-
Enable the option Stream Data
Streaming is not available when
-
The source or destination is a nested structure (e.g., complex JSON or XML)
-
You are using nested mappings or after-extract transformations
When streaming is not available automatically, consider using Force Streaming.
Use Force Streaming When Regular Streaming Is Not Available
In some cases, streaming is automatically disabled even if it is explicitly enabled, such as when:
-
The source is a non-flat file (e.g., JSON or XML)
-
The source is a web service response with filtering SQL queries
To improve performance in these scenarios:
-
Go to Mapping → Parameters
-
Enable the option Force Streaming
How Force Streaming Works
-
The system still loads the full dataset into memory
-
Records are then processed one at a time
-
SQL statements are executed per record, not accumulated
-
Bind variables and batching (if enabled) can still be used
This approach reduces memory usage and improves performance when working with large or complex datasets, especially when writing to relational databases.
Read about Streaming vs Force Streaming
Split Large Files
Processing large files, especially large XML documents, can be slow and memory-intensive. A better approach is to:
-
Split the large file into smaller, manageable chunks
-
Process each chunk in a loop
Learn more: How to split large files
Use Server Storage When Possible
It is generally faster to process files stored in server storage than remote files or cloud storage.
Recommended Approach:
-
Step 1: Create a flow that copies or moves files to server storage.
-
Step 2: Create a second flow that processes the files from server storage.
-
Step 3: Create a nested flow that runs Steps 1 and 2 in sequence.
This setup ensures faster I/O operations and improves the reliability of file-based integrations.
Use Parallel Loops for Large Batches
When executing a large number of identical operations (e.g., file processing or API calls), parallel loops can significantly reduce execution time.
Learn more: Using parallel loops
Enable Parallel File Operations with Wildcard File Names
If you are processing multiple files using a wildcard file name, you can improve performance by enabling parallel file operations. This allows multiple files to be processed at the same time.
Learn more: Enabling parallel file operations
Comments
0 comments
Please sign in to leave a comment.