Overview

Etlworks supports various file operations, allowing users to manage files directly without modifying their content. These operations include:

Copy files – copy files between connections.
Move files – copy files to a new location, then delete them from the source.
Rename files – change file names (within the same or different connections).
Delete files – delete files from a connection.
Create folder(s) – create one or more nested folders.
Zip files – create.zip and.gz archives.
Unzip files – extract files from.zip and.gz archives.
Check number of files in folder – validate the number of files against a constant or expression.
All file management operations – perform multiple file operations in a single flow.

Important: When working directly with files, Etlworks does not modify the files. Use ETL flow types if you need to transform the data.

Creating a File Management Flow

To create a Flow for file operations:

Step 1. Click Add Flow in the Flows window.

Step 2. Type in flow type in Select Flow Type (for example move)

Step 3. Choose the desired file operation (e.g., Copy, Move, Rename, etc.).

Step 4. Add transformations and modify Flow parameters as needed.

File Management Flows

Copy files

This flow copies files between connections.

Step 1. Start creating the Flow in the Flows window by clicking +, and typing in copy files.

Step 2. Select the source Connection. It can be any one of the following Connection types:

Step 3. In the FROM field, enter the file name, or a wildcard file name, for the file(s) to copy.

Step 4. Select the destination Connection. It can be any one of the following Connection types:

Step 5. Optionally, enter a new file name or a new wildcard file name into the TO field. Read how the system calculates a destination file name in file operations.

Step 6. Click MAPPING, select the Parameters tab, and modify the following parameters, if necessary:

Add Suffix to the Destination File Name: you can select one of the predefined suffixes for the files created using this file operation. For example, if you select uuid as a suffix and the original filename is dest.csv, Etlworks will create files with the name dest_uuid.csv, where UUID is a globally unique identifier such as 21EC2020-3AEA-4069-A2DD-08002B30309D.
Capture metrics: if this option is selected, the information about each processed file will be captured and displayed in the Flow dashboard. Disable if you expect to process a large number of files (> 1K).
Wait before moving to next file: the number of milliseconds to wait before starting to copy the next file. This parameter is used to prevent throttling, specifically when the destination is an HTTP endpoint. Read about battling the throttling.
Maximum Simultaneous Operations: Etlworks can copy each file in its own thread. Use this property to set the maximum number of simultaneous file operations. Read about parallel processing.
Maximum Number of Files to Process: if the value of this property is greater than 0, the Flow will stop copying the files after the number of processed files will reach the configured threshold.
On Exception: by default, any error causes execution to halt. When ignore is selected, the errors will be ignored, and execution will continue. Read how to configure the Flow to ignore all or specific exceptions.
Exception Mask: you can specify what errors should be ignored and still halt the execution for all other errors. Enter part or all of the exception string. This field works only when the Ignore On Exception option is selected.
Execute if Error: if this option is selected, Etlworks will execute this file operation if there is an error. It might be useful if, for example, you want to copy files that haven't been processed yet, due to an error, to a failed folder.

Move files

This flow copies files to a new location, then delete them from the source.

Step 1. Start creating a Flow in the Flows window by clicking +, and typing in move files.

Step 2. Select the source Connection. It can be any one of the following Connection types:

Step 3. In the FROM field, enter the file name, or a wildcard file name, for the file(s) to move.

Step 4. Select the destination Connection. It can be any one of the following Connection types:

Step 5. Optionally, enter a new file name or a new wildcard file name into the TO field. Read how the system calculates a destination file name in file operations.

Step 6. Click MAPPING, select the Parameters tab, and modify the following parameters, if necessary:

Add Suffix to the Destination File Name: you can select one of the predefined suffixes for the files created using this file operation. For example, if you select uuid as a suffix and the original filename is dest.csv, Etlworks will create files with the name dest_uuid.csv, where UUID is a globally unique identifier such as 21EC2020-3AEA-4069-A2DD-08002B30309D.
Capture metrics: if this option is selected, the information about each processed file will be captured and displayed in the Flow dashboard. Disable if you expect to process a large number of files (> 1K).
Wait before moving to next file: the number of milliseconds to wait before starting to copy the next file. This parameter is used to prevent throttling, specifically when the destination is an HTTP endpoint. Read about battling the throttling.
Maximum Simultaneous Operations: Etlworks can move each file in its own thread. Use this property to set the maximum number of simultaneous file operations. Read about parallel processing.
Maximum Number of Files to Process: if the value of this property is greater than 0, the Flow will stop moving the files after the number of processed files will reach the configured threshold.
On Exception: by default, any error causes execution to halt. When ignore is selected, the errors will be ignored, and execution will continue.
Exception Mask: you can specify what errors should be ignored and still halt the execution for all other errors. Enter part or all of the exception string. This field works only when the Ignore On Exception option is selected.
Execute if Error: if this option is selected, Etlworks will execute this file operation if there is an error. It might be useful if, for example, you want to copy files that haven't been processed yet, due to an error, to a failed folder.

Rename files

This flow changes file names (within the same or different connections).

Step 1. Start creating a Flow in the Flows by clicking +, and typing in rename files.

Step 2. Select the source Connection. It can be any one of the following Connection types:

Step 3. In the FROM field, enter a file name or a wildcard file name of the file(s) to rename.

Step 4. Select a destination Connection. It can be any one of the following Connection types:

Step 5. Enter a new file name into the TO field. Read how the system calculates a destination file name in file operations.

Step 6. Click the MAPPING button, select the Parameters tab, and modify the following parameters, if necessary:

Add Suffix to the Destination File Name: you can select one of the predefined suffixes for the files created using this file operation. For example, if you select uuid as a suffix, and the original filename is dest.csv, Etlworks will create files with the name dest_uuid.csv, where uuid is a globally unique identifier such as 21EC2020-3AEA-4069-A2DD-08002B30309D.
Capture metrics: if this option is selected, the information about each processed file will be captured and displayed in the Flow dashboard. Disable if you expect to process a large number of files (> 1K).
Maximum Number of Files to Process: if the value of this property is greater than 0, the Flow will stop renaming the files after the number of processed files will reach the configured threshold.
On Exception: by default, any error causes execution to halt. When ignore is selected, the errors will be ignored, and execution will continue.
Exception Mask: you can specify what errors should be ignored while still halting execution for all other errors. Enter part or all of the exception string. This field works only when the Ignore On Exception option is selected.
Execute if Error: if this option is selected, Etlworks will execute this file operation if there is an error. It might be useful if, for example, you want to copy files that haven't been processed yet, due to an error, to a failed folder.

Delete files

This flow deletes files from the Connection.

Step 1. Start creating a Flow in the Flows by clicking +, and typing in delete files.

Step 2. Select the source Connection. It can be any one of the following Connection types:

Amazon S3
Google Cloud Storage
Microsoft Azure Storage
Server Storage
FTP
FTPS
SFTP
Box
Dropbox
Google Drive
OneDrive for Business
SharePoint
WebDAV
SMB Share
HTTP (web service): Etlworks will attempt to execute HTTP requests using HTTP DELETE method.
Redis
Facebook
Twitter

Step 3. In the FROM field, enter a file name, or a wildcard file name of the file(s) to delete.

Step 4. Click MAPPING, select the Parameters tab, and modify the following parameters, if necessary:

Wait before moving to next file: number of milliseconds to wait before starting to delete the next file. This parameter is used to prevent throttling, specifically when the destination is an HTTP endpoint. Read about battling the throttling.
Capture metrics: if this option is selected, the information about each processed file will be captured and displayed in the Flow dashboard. Disable if you expect to process a large number of files (> 1K).
Maximum Number of Files to Process: if the value of this property is greater than 0, the Flow will stop deleting the files after the number of processed files will reach the configured threshold.
On Exception: by default, any error causes execution to halt. When ignore is selected, the errors will be ignored, and execution will continue.
Exception Mask: you can specify what errors should be ignored while still halting execution for all other errors. Enter part or all of the exception string. This field works only when the Ignore On Exception option is selected.
Execute if Error: if this option is selected, Etlworks will execute this file operation when an error occurs. It might be useful if, for example, you want to delete the files that haven't been processed yet, due to the error.

Create folder(s)

This flow creates folders within a Connection. It supports creating multiple nested folders in a single step.

Step 1. Start creating a Flow in the Flows by clicking +, and typing in create folder.

Step 2. Select the source Connection. It can be either one of the following Connection types:

Step 3. In the FROM field, enter the name of the folder to create. Use folder1/folder2/foldern to create multiple nested folders.

Important: The folder will be created under the base Directory, specified in the Connection.

Step 4. Click MAPPING, select the Parameters tab, and modify the following parameters, if necessary:

On Exception: by default, any error causes execution to halt. When ignore is selected, the errors will be ignored, and execution will continue.
Exception Mask: you can specify what errors should be ignored while still halting execution for all other errors. Enter part or all of the exception string. This field works only when the Ignore On Exception option is selected.
Execute if Error: if this option is selected, Etlworks will execute this file operation when any error occurs.

Zip files

This flow creates.zip and.gz archives.

Step 1. Start creating a Flow in the Flows by clicking + button, and typing in zip files.

Step 2. Select the source Connection. It should have the following Connection type:

Server storage

Step 3. In the FROM field enter the file name, or a wildcard file name of the file(s) to zip.

Step 4. Select the destination Connection. It should be the following Connection type:

Server storage

Step 5. Enter the name of the archived file into the TO field.

Tip: By default, Etlworks creates archived files in the Zip Format. Enter a file name with the extension gzip to create an archive in the gzip Format.

Step 6. Click MAPPING, select the Parameters tab, and modify the following parameters, if necessary:

Action: choose either Zip or Zip and Delete. The latter will delete the original source files after creating an archived file.
Add Suffix to the Zip File Name: you can select one of the predefined suffixes for the files created using this file operation. For example, if you select uuid as a suffix and the original filename is dest.zip, Etlworks will create a file with the name dest_uuid.zip, where uuid is a globally unique identifier such as 21EC2020-3AEA-4069-A2DD-08002B30309D.
Capture metrics: if this option is selected, the information about each processed file will be captured and displayed in the Flow dashboard. Disable if you expect to process a large number of files (> 1K).
Maximum Simultaneous Operations: Etlworks can process each file in its own thread. Use this property to set the maximum number of simultaneous file operations. Read about parallel processing.
Maximum Number of Files to Process: if the value of this property is greater than 0, the Flow will stop archiving the files after the number of processed files will reach the configured threshold.
Zip password: optional password for the Zip file. Only zipped files can be password protected.
On Exception: by default, any error causes execution to halt. When ignore is selected, the errors will be ignored, and execution will continue.
Exception Mask: you can specify what errors should be ignored while still halting execution for all other errors. Enter part or all of the exception string. This field works only when the Ignore On Exception option is selected.
Execute if Error: if this option is selected, Etlworks will execute this file operation when any error occurs.

Unzip file

This flow extracts files from.zip and.gz archives.

Step 1. Start creating a Flow in the Flows window by clicking +, and typing in unzip files.

Step 2. Select the source Connection. It should have the following Connection type:

Server storage.

Step 3. In the FROM field, enter the archived file name. The wildcard file names are supported.

Step 4. Select the destination Connection. It should have the following Connection type:

Server storage

Step 5. Click MAPPING, select the Parameters tab, and modify the following parameters, if necessary:

Action: choose one of the following: Unzip, Unzip and Delete,UnGZip or UnGZip and Delete. Actions with Delete will delete the original, archived file.
Capture metrics: if this option is selected, the information about each processed file will be captured and displayed in the Flow dashboard. Disable if you expect to process a large number of files (> 1K).
Maximum Simultaneous Operations: Etlworks can process each file in its own thread. Use this property to set the maximum number of simultaneous file operations. Read about parallel processing.
Zip password: optional password for the Zipped file.
Do not create subfolders when unzipping files with nested folders : if this option is enabled and the zip file has nested subfolders, the Flow will unzip files directly to the destination folder without creating subfolders. If this option is disabled (default), the Flow will create the subfolder for each folder in the zip file.
On Exception: by default, any error causes execution to halt. When ignore is selected, the errors will be ignored, and execution will continue.
Exception Mask: you can specify what errors should be ignored while still halting execution for all other errors. Enter part or all of the exception string. This field works only when the Ignore On Exception option is selected.
Execute if Error: if this option is selected, Etlworks will execute this file operation when any error occurs.

Check number of files in folder

This flow validates the number of files against a constant or expression.

When using a constant, if the number of files is not equal to what was expected, Etlworks will generate an exception.

The most common use case is to generate an exception if the actual number of matching files is not 0.

When using an expression, Etlworks evaluates the expression, and if the returned value is a boolean false, it throws an Exception. Example of the boolean expression:

filesCount > 5

Step 1. Start creating a Flow in the Flows by clicking + and, typing in check number of files.

Step 2. Select the source Connection. It can be either one of the following Connection types:

Step 3. In the FROM field, enter the file name or the wildcard file name to look for.

Step 4. Click MAPPING, select the Parameters tab and modify the following parameters, if necessary:

Expected Number of Files or Expression: enter the expected number of files, for example 0, or a boolean expression, for example, filesCount > 5. The expression can be any JavaScript or Python code that returns boolean false or true. The following objects are available by reference:
- filesCount: the actual number of files which name matches the entered file name or the wildcard file name.
- files: java.utils.ArrayList containing files which name matches the entered file name or the wildcard file name.
On Exception: by default, any error causes execution to halt. When ignore is selected, the errors will be ignored, and execution will continue.
Exception Mask: you can specify what errors should be ignored while still halting execution for all other errors. Enter part or all of the exception string. This field works only when the Ignore On Exception option is selected.
Execute if Error: if this option is selected, Etlworks will execute this file operation when any error occurs.

All file management operations

The File Management Flow allows you to create multiple transformations, such as copy, move, rename, etc., within a single Flow.

Step 1. Create the source (FROM) and destination (TO) Connections, which can be one of the following:

Step 2. Select a File Management Flow type from the list.

Step 3. Create a transformation where the source (FROM) is a location with source files and the destination (TO) is a location where files should be created (which can be the same method used for the rename operation).

Step 4. Select or enter the source and destination file names in the Mappings box. Both can be wildcard names, such as *.csv.

Step 5. Continue by specifying the transformation parameters:

Action: file operation, such as Copy, Move, Rename or Delete the files, Create Folder(s), Check number of files in the folder.
Do not process files which have already been processed: if this option is enabled, Etlworks will not process files that have already been processed.
Capture metrics: if this option is selected, the information about each processed file will be captured and displayed in the Flow dashboard. Disable if you expect to process a large number of files (> 1K).
Wait before moving to the next file: the number of milliseconds to wait before starting to process the next file. This parameter is used to prevent throttling, especially when the destination is an HTTP endpoint. Read about battling the throttling.
Maximum Simultaneous Operations: Etlworks can process each file in its own thread. Use this property to set the maximum number of simultaneous file operations. If not set, the default is 10. Read about parallel processing.
Maximum Number of Files to Process: if the value of this property is greater than 0 the Flow will stop processing the files after the number of processed files will reach the configured threshold.
On Exception: by default, any error halts execution. When ignore is selected, the errors will be ignored, and execution will continue.
Exception Mask: you can specify what errors should be ignored while still halting execution for all other errors. Enter part or all of the exception string. This field works only when Ignore On Exception option is selected.
Execute if Error: if this option is selected and an error occurs, Etlworks will execute the chosen file operation. It can be useful if, for example, you want to move files that haven't been processed due to an error to the failed folder.

Tip: When selecting FROM and TO:

For Copyactions: choose the source (FROM) and destination (TO) Connections. Enter the file name, or a wildcard file name, such as *.csv, into the source (FROM) field. The files from the source location (FROM Connection) will be copied to the destination location (TO Connection).
For Move actions: choose the source (FROM) and destination (TO) Connections (they can be the same). Enter the file name, or a wildcard file name, such as *.csv, into the source (FROM) field. Files from the source location (FROM Connection) will be moved to the destination location (TO Connection).
For Rename actions: choose the source (FROM) and destination (TO) Connections (they can be the same). Enter the file name, or a wildcard file name, such as *.csv, into the source (FROM) field. Enter a new file name, or a wildcard file name, such as dest.*, into the destination (TO) field. Files from the source location (FROM Connection) will be moved to the destination location (TO Connection) and renamed in the process.
For Delete actions: choose the source (FROM) Connection. Enter the file name, or a wildcard file name, such as *.csv, into the source (FROM) field. Files in the source location (FROM Connection), which match the string entered in the FROM field, will be deleted.
For Create Folder(s) actions: choose the source (FROM) Connection. Enter the name of the folder to be created into the FROM field. If that folder doesn't exist, it will be created under the URL/ Directory of the FROM Connection.
For Check Number of Filesactions: choose the source (FROM) Connection. Enter the file name, or a wildcard file name, such as *.csv, into the source (FROM) field. The system will calculate the number of files whose names match the FROM field, compare it to the entered number of files, and generate an exception if they are not equal.

Advanced Features

Filter files using a script

Etlworks provides multiple ways to filter or exclude files before processing them.

Option 1. Filter files inline using JavaScript

You can filter source files directly within file-based flows using a JavaScript script.

The script can be entered in the File filter script field.

It is available in the following flow types:

Copy files
Move files
Rename files
Delete files
Unzip files

The script runs once for each source file and determines whether the file should be included in processing. It has access to the following objects:

file.getName — returns the file name
file.getSize — returns the file size in bytes
file.getLastModified — returns the last modified timestamp (epoch)
etlConfig — the current flow configuration
task — the current flow task object
files — the original list of FileResource objects

If the script returns false, the file is excluded from processing. Returning any other value (or true) allows the file to proceed.

Example: exclude files larger than 50 MB or older than 30 days:

var maxSize = 50 * 1024 * 1024; // 50 MB
var maxAge = 30 * 24 * 60 * 60 * 1000; // 30 days
var now = new Date().getTime();

if (file.getSize() > maxSize) {
   false;
} else if (now - file.getLastModified() > maxAge) {
   false;
} else {
   true;
}

This approach allows you to dynamically skip unwanted files.

Option 2. Create buckets using a script flow

It is quite typical when you must filter (or group) files by filename, size, or other attributes, create buckets, and process each bucket independently.

In Etlworks, you can use a Script Flow to create buckets. Then, instead of providing a file name or a wildcard filename in file operations, use a bucket name.

Example: create a flow that moves files smaller than 10KB to the small files folder and those larger or equal to 10KB to the large files folder.

Automatically Create Missing Folders

All file-based and cloud storage connectors include a checkbox:

Automatically Create Missing Folders

When enabled, Etlworks automatically creates missing folders when writing files (ETL and file flows).
Disabled by default for backward compatibility and performance reasons.

Skip Already Processed Files

To skip already processed files in file operations such as copy, move, delete, rename, zip, and the flow that includes all file operations, enable the Skip Previously Processed Files option. When enabled, the system checks if a file has been processed before by storing its filename and last modified timestamp in a cache. The next time the flow runs, files already recorded in the cache are skipped.

You can configure two additional parameters to fine-tune this behavior:

• File Retention in Cache (ms): Defines how long processed file records are retained in the cache before they expire. If left blank, files remain in the cache indefinitely.

• Custom Cache File Name: Specifies a custom file to store processed file information. This is useful when running multiple flows that should share the same cache or when different flows require separate caches.

By using these settings, you can prevent unnecessary reprocessing of files, improving efficiency and reducing redundant operations.

Enable parallel file operations

When processing files using a wildcard pattern (e.g., *.csv), you can speed up execution by enabling parallel processing. This is controlled by the Maximum Simultaneous Operations parameter, which should be set to a value greater than 1. The system will then process multiple files concurrently, significantly improving performance for large file batches.

Parallel processing is supported for the following file operations:

Important: If the Wait Before Moving to Next File parameter is set to a value greater than zero, parallel processing is automatically disabled. This setting introduces a delay between file operations and is designed to manage throttling imposed by third-party services.

Managing API Rate Limits and Throttling

Many third-party services impose rate limits to prevent excessive requests within a short time frame. To comply with these restrictions and avoid throttling errors, Etlworks provides configurable options to control file processing speed and batch sizes.

• Wait Before Moving to Next File: Set this parameter to a value greater than zero to introduce a delay between processing each file. This helps distribute requests more evenly over time, reducing the risk of hitting API rate limits.

• Maximum Number of Files to Process: Define an upper limit for the number of files processed in a single execution. If set, the flow stops after reaching the specified threshold, allowing large file sets to be processed in manageable chunks.

By leveraging these settings, you can optimize performance while ensuring smooth interactions with external services that enforce throttling policies.

Process files in the specific order

1. Captures the list of matching files.

2. Sorts the files using the selected File processing order from the source Connection.

Available options:

Disabled: default sorting for the Connection, mostly likely by filename with ascending order
oldest: oldest files first
newest: newest files first
ascending: by filename with ascending order
descending: by filename with descending order
largest: largest files first
smallest: smallest files first

Articles in this section

Managing Files and Folders with File Management Flows

Overview

Creating a File Management Flow

File Management Flows

Copy files

Move files

Rename files

Delete files

Create folder(s)

Zip files

Unzip file

Check number of files in folder

All file management operations

Advanced Features

Filter files using a script

Option 1. Filter files inline using JavaScript

Option 2. Create buckets using a script flow

Automatically Create Missing Folders

Skip Already Processed Files

Enable parallel file operations

Managing API Rate Limits and Throttling

Process files in the specific order

Articles in this section

Overview

Creating a File Management Flow

File Management Flows

Copy files

Move files

Rename files

Delete files

Create folder(s)

Zip files

Unzip file

Check number of files in folder

All file management operations

Advanced Features

Filter files using a script

Option 1. Filter files inline using JavaScript

Option 2. Create buckets using a script flow

Automatically Create Missing Folders

Skip Already Processed Files

Enable parallel file operations

Managing API Rate Limits and Throttling

Process files in the specific order

Related articles