Overview
This is the most commonly used Flow type in Etlworks. Almost all available connectors can be used as both sources and destinations.
When to use this Flow type
This Flow extracts data from a source connection, applies one or more transformations, and then loads the data into a destination.
The following variations of this Flow are available:
- The source is a relational database.
- The destination is a relational database.
- The source is a Document: File, API endpoint, NoSQL database, or inbound email.
- The destination is a Document: File, API endpoint, NoSQL database, or outbound email.
- The source is a well-known API.
- The destination is a well-known API.
- The source is a message queue or broker
- The destination is a message queue or broker
- Any To Any ETL.
If you only need to copy, move, rename, delete, or zip/unzip files without modifying data, consider using file operation flows instead.
The source is a relational database
When the source is a relational database, the system extracts data using one of two methods:
• The system can execute a default query to select all records from a table, view, or synonym. This is done by entering the table name in the FROM field. If a wildcard is used (for example, public.*), all database objects matching the wildcard pattern will be extracted.
• A custom SQL query can be entered in the Source Query field. This allows filtering, aggregation, or complex joins before loading the data.
Create Flow when the source is a relational database
To create a Flow, go to Flows
window, click +
, and type in database to
. Select one of the available Flows types and add one or multiple source-to-destination transformations.
Options for Defining the Source (FROM)
Extract data from a table, view, or synonym
Select or enter a database object name (e.g., table, view, synonym).
Etlworks will automatically execute:
SELECT * FROM object_name;
Execute a user-defined SQL query
Enter a custom SELECT SQL statement in the “Source Query” field.
Executing SQL Scripts in Source Queries
If the database supports stored procedures or anonymous SQL blocks, you can execute a multi-statement SQL script as the source query.
Conditions for using SQL scripts:
• The script must return a cursor.
• Enable “Execute as Script” in the Source Query settings.
Available transformations when the source is a relational database
- SQL
- Source-to-destination
- Per-field Mapping
- Validation
- Remove duplicates
- Scripting transformations
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the source is a relational database
- Transformation name
- Linked Source Name
- Stream Data
- Parallel
- Ignore when there are no records
- Change replication
- Handling exceptions
The destination is a relational database
When the destination is a relational database, Etlworks can either:
• Automatically generate and execute SQL statements (INSERT, UPDATE, DELETE, MERGE)
• Execute user-defined SQL statements entered in the Destination Query field.
Create Flow when the destination is a relational database
To create a Flow, go to the Flows
window, click +
, and type in to database
. Select one of the available Flows types and add one or multiple source-to-destination transformations.
Options for Defining the Destination (TO)
Enter a database object name
Enter or select a database object name (e.g., table, view, synonym) in the TO field. If the source uses a wildcard, the * character can be used in the destination name to dynamically substitute the source name.
Enter a custom SQL statement in Destination Query
Enter a custom SQL statement in the Destination Query field. If multiple SQL statements are needed, they should be separated by semicolons (;).
Using a custom destination query disables bind variables, which may impact performance. Leave the Destination Query field empty to let the system generate optimized queries automatically.
In the following example, data is extracted from a staging table (locust_staging), and the results are used to update multiple tables (hotlineevent and locust_staging) in the destination.
Source query:
SELECT locust_staging.id,
locust_update_id.hotlineeventid,
recorded_time AS calldatetime,
CASE
WHEN date_completed IS NULL
THEN 2
ELSE 1
END AS eventstatusid,
FROM locust_staging
Destination query:
update hotlineevent
set calldatetime = {calldatetime},
where hotlineeventid = {hotlineeventid};
update locust_staging
set HotlineEventId = {hotlineeventid}
where id = {id};
Key Features of Custom Destination Queries
• {} placeholders dynamically insert values from the source dataset into the SQL statement.
• Multiple statements are executed sequentially per record.
• Transactional behavior is supported—if one statement fails, the flow may rollback depending on database settings.
• Bind variables are disabled when using custom queries, so performance may be impacted.
Available transformations when the destination is a relational database
- SQL
- Source-to-destination
- Per-field mapping
- Filter (only if the source is not database)
- Validation
- Remove duplicates
- Scripting transformations
- Extract dimensions (only if the source is not database)
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the destination is a relational database
- The Transformation name
- Linked Source Name
- Stream Data
- Use Bind Variables
- Parallel
- Ignore when there are no records
- Use Native SQL (deprecated)
- Target Table Creation & Matching
- Create Indexes
- Before SQL
- After SQL
- SQL Action
- Change replication
- Handling exceptions
The source is a Document
The source in a Flow can take multiple forms. It can be:
• A flat, structured, or semi-structured file such as CSV, JSON, XML, Excel, or Parquet
• A response from an API endpoint such as REST, SOAP, or GraphQL
• A key in a key-value storage system such as Redis or Memcached
• A document in a NoSQL database such as MongoDB, DynamoDB, or CouchDB
• An email body or attachment from an inbound email connection
Regardless of the source type, Etlworks parses the data, transforms it into an internal format, and then converts it to the appropriate destination format before sending it to the target system.
Here is an example of the Flow that extracts data from a web service, transforms it, and loads a the database.
Flow Types When the Source is a Document
The table below lists the most common Flow types and how they handle source objects.
Flow type | Source Connection Type |
Source object name |
---|---|---|
File to... |
File storage (local, server or cloud) |
Actual file name or wildcard (for example, orders_*.csv) |
Web service to... |
HTTP |
API response name (for example, patient.json) |
Email to... |
Inbound email Connection |
Attachment name or other identifier |
NoSQL to... |
A key-value storage (such as Redis) |
A key in a key-value pair |
NoSQL to... |
A NoSQL database (such as MongoDB) |
Document ID |
Create a Flow When the Source is a Document
To create a Flow:
Step 1. Open the Flows window
Step 2. Click + to add a new Flow
Step 3. Select one of the following options based on the source type:
• For file sources, type “file to”
• For HTTP API sources, type “web service to”
• For NoSQL sources, type “nosql to”
• For email sources, type “email to”
Step 4. Add one or multiple source-to-destination transformations
Use SQL to transform data
Even when the source is not a relational database, users can apply SQL transformations after the data is extracted. This is particularly useful for processing structured and semi-structured data from files, API responses, and NoSQL databases like MongoDB.
One of the most common use cases for SQL transformations is extracting data from nested documents in JSON, XML, and other hierarchical formats. For example, when processing a JSON API response or a MongoDB document, SQL can be used to:
• Flatten nested structures by selecting specific fields
• Extract and transform arrays into tabular format
• Filter and aggregate values from within nested objects
Other use cases for SQL transformations include:
• Summarizing and grouping data from files and API responses
• Filtering or removing specific records before writing to the destination
• Reshaping the data structure by renaming or reordering columns
• Performing calculations, joins, and conditional transformations
For MongoDB sources, SQL is automatically converted into MongoDB’s internal query language, allowing users to filter, aggregate, and transform data directly within the database before it is loaded into the destination.
Read more about SQL transformation.
Work with wildcard file names
When the source is a file, it is possible to configure a Flow to process multiple files dynamically by specifying a wildcard pattern in the source name. The system will automatically read all matching files, apply transformations, and load them into the destination.
For example, to process all CSV files in a directory:
*.csv
Read how to work with wildcard file names and how to process all files by the wildcard.
Available transformations when the source is a document
- SQL
- Source-to-destination
- Per-field Mapping
- Filter
- Order-by
- Validation
- Remove duplicates
- Scripting transformations
- Extract dimensions
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the source is a document
- The Transformation name
- Linked Source Name
- Stream Data
- Force Streaming
- Parallel
- Ignore when there are no records
- Ignore when there is no file
- Process all files
- Delete loaded sources files
- Delete source files on error
- Change replication
- Handling exceptions
The destination is a Document
The destination in a Flow can take multiple forms. It can be:
• A flat, structured, or semi-structured file such as CSV, JSON, XML, Excel, or Parquet
• A payload sent to an API endpoint, such as a REST, SOAP, or GraphQL request
• A document stored in a NoSQL database such as MongoDB, DynamoDB, or CouchDB
• An email body or attachment in an outbound email connection
Etlworks converts the data extracted from the source into the appropriate destination format by applying transformations and mapping rules.
Flow Types When the Destination is a Document
Flow Type | Destination Connection Type |
Destination object name |
---|---|---|
... to file |
File storage (local, network, or cloud) |
A specific file name or a dynamically generated name |
... to web service |
HTTP |
API payload name (for example, patient.json) |
... to email |
Outbound email Connection |
Attachment name or any appropriate identifier |
... to NoSQL |
A key-value storage (such as Redis) |
A key in a key-value pair |
... to NoSQL |
A NoSQL database (such as MongoDB) |
Document id. |
If the destination is a file storage system, a new file will be created each time the transformation runs. If a file with the same name already exists, it will be overwritten unless configured otherwise.
Create A Flow When the Destination is a Documnent
To create a Flow:
Step 1. Open the Flows window.
Step 2. Click + to add a new Flow.
Step 3. Select one of the following options based on the destination type:
• For file destinations, type “to file”
• For HTTP API destinations, type “to web service”
• For NoSQL destination, type “to nosql”
• For email destination, type “to email”
Step 4. Add one or more source-to-destination transformations.
Available transformations when the destination is a document
- SQL
- Source-to-destination
- Per-field Mapping
- Filter
- Order-by
- Validation
- Remove duplicates
- Scripting transformations
- Extract dimensions
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the destination is a document
- The Transformation name
- Linked Source Name
- Stream Data
- Force Streaming
- Parallel
- Ignore when there are no records
- Ignore when there is no file
- Process all files
- Delete loaded sources files
- Delete source files on error
- Change replication
- Handling exceptions
The source is a well-known API
A well-known API in Etlworks is an API for which there is a built-in, native connector. This differs from:
• A generic HTTP API, which requires a user-defined request and response handling.
• SaaS platform connectors, which expose objects as tables or views and are used with database-like Flow types.
Examples of well-known APIs with built-in read connectors in Etlworks include:
• Google Sheets
• Google Analytics
• Google AdWords
• Facebook Feed
• Twitter Feed
These connectors simplify the process of authenticating, extracting, and transforming data, eliminating the need for manual API request configurations.
Here is an example of the Flow, which extracts data from the well-known API, transforms it, and loads into the database.
Create Flow when the source is a well-known API
To create a Flow that extracts data from a well-known API:
Strep 1. Open the Flows window.
Step 2. Click + to add a new Flow.
Step 3. Select “well-known api to” and choose the specific API Flow type.
Step 4. Add one or more source-to-destination transformations.
Available transformations when the source is a well-known API
- SQL
- Source-to-destination
- Per-field Mapping
- Filter
- Order-by
- Validation
- Remove duplicates
- Scripting transformations
- Extract dimensions
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the source is a well-known API
- The Transformation name
- Linked Source Name
- Stream Data
- Force Streaming
- Parallel
- Ignore when there are no records
- Ignore when there is no file
- Process all files
- Delete loaded sources files
- Delete source files on error
- Change replication
- Handling exceptions
The destination is a well-known API
A well-known API in Etlworks is an API for which there is a built-in, native connector. This differs from:
• A generic HTTP API, which requires a user-defined request and response handling.
• SaaS platform connectors, which expose objects as tables or views and are used with database-like Flow types.
Examples of well-known APIs with built-in write connectors in Etlworks include:
• Google Sheets
• Facebook Update
• Twitter Update
Create Flow when the destination is a well-known API
To create a Flow that load data into a well-known API:
Strep 1. Open the Flows window.
Step 2. Click + to add a new Flow.
Step 3. Select “to well-known api” and choose the specific API Flow type.
Step 4. Add one or more source-to-destination transformations.
Available transformations when the destination is a well-known API
- SQL
- Source-to-destination
- Per-field Mapping
- Filter
- Order-by
- Validation
- Remove duplicates
- Scripting transformations
- Extract dimensions
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the destination is a well-known API
- The Transformation name
- Linked Source Name
- Stream Data
- Force Streaming
- Parallel
- Ignore when there are no records
- Ignore when there is no file
- Process all files
- Delete loaded sources files
- Delete source files on error
- Change replication
- Handling exceptions
The source is a message queue or broker
A message queue or broker is a system that facilitates asynchronous communication between different applications by storing, routing, and delivering messages. Unlike traditional data sources such as databases or APIs, message queues do not store persistent datasets but instead stream real-time events that need to be processed as they arrive.
Common message queues and brokers include:
• Kafka – A distributed event-streaming platform used for real-time analytics and event-driven architectures
• RabbitMQ – A widely used message broker that supports multiple messaging protocols
• AWS SQS and SNS – Cloud-based queuing and notification services
• ActiveMQ, Google Pub/Sub, and others
Processing Data from Message Queues
Etlworks supports two processing modes for consuming data from message queues:
• Streaming mode – Messages are processed and sent to the destination in real time, as they arrive. There is no batching, and the Flow runs continuously.
• ETL (batch) mode – Messages are collected in batches before being transformed and loaded into the destination. Batches can be processed based on:
- Fixed batch size (e.g., process 100 messages at a time)
- Time intervals (e.g., process all messages received in the last 5 minutes)
Flow Types for Processing Message Queues
There are two types of Flows designed for working with message queues:
• Streaming Flows – Select a “Stream messages from…” Flow type to process messages in real time. These Flows continuously listen for new messages and send them to the destination without delay. Read about streaming flows.
• ETL (batch) Flows – Select a “Queue to…” Flow type to process messages in batches. These Flows read multiple messages at once, apply transformations, and load them into the destination at scheduled intervals. Read about ETL flows.
Create a Flow When the Source is a Message Queue
To create a Flow:
Step 1. Open the Flows window
Step 2. Click + to add a new Flow
Step 3. Select a Flow type based on the processing mode:
• For real-time streaming, select a “Stream messages from…” Flow type
• For batch processing, select a “Queue to…” Flow type
Step 4. Add one or more source-to-destination transformations.
The destination is a message queue or broker
A message queue or broker is a system that facilitates asynchronous communication by receiving and distributing messages between different applications. Instead of storing data in a persistent structure like a database or file system, message queues temporarily hold messages until they are processed by consumers.
Etlworks supports writing data to various message queues and brokers, including:
• Kafka – A distributed event-streaming platform for real-time data processing
• RabbitMQ – A message broker supporting multiple messaging protocols
• AWS SQS and SNS – Cloud-based queuing and notification services
• ActiveMQ, Google Pub/Sub, and others
When a message queue is used as a destination, the Flow transforms the incoming data into the required format and publishes messages to the target queue or topic. Depending on the queue type, messages can be processed by a single consumer (point-to-point messaging) or multiple consumers (publish-subscribe model).
Flows that send data to a message queue typically process data in batches or as individual messages, depending on the configuration. This allows for real-time streaming or bulk publishing of messages based on the use case.
Read about ETL flows that write messages to queue.
Create a Flow When the Destination is a Message Queue
To create a Flow:
Step 1. Open the Flows window
Step 2. Click + to add a new Flow
Step 3. Select a “to queue” Flow type:
Step 4. Add one or more source-to-destination transformations.
Common flow parameters
Flow variables
User-defined flow variables can be added to the Flow configuration. To do this, open the Parameters tab and add variables as key-value pairs. Variable names should be in uppercase and must not contain spaces. These variables can be referenced in transformations, SQL queries, and scripts within the Flow.
Maximum number of parallel threads
Etlworks supports executing multiple source-to-destination transformations in parallel and executing wildcard transformations in parallel to improve performance.
• By default, the system limits the number of parallel threads to 5.
• To change this setting, open the Parameters tab and update the Maximum Number of Parallel Threads value.
• The maximum allowed number of threads is 99.
Increasing this value can speed up processing but may also increase database load and memory usage, so it should be adjusted carefully based on system capacity.
If a Flow contains multiple source-to-destination transformations or processes a wildcard source, it is possible to automatically retry failed or not yet executed transformations when the Flow restarts.
To enable this feature:
1. Open the Parameters tab.
2. Enable the Retry Failed Transformations option.
This ensures that the Flow continues processing even if some transformations fail initially.
Read more about retrying failed transformations.
Error if Mapped Field is Missing in Source
By default, if a field in the mapping configuration does not exist in the actual source dataset, the behavior of the Flow depends on the destination type and streaming settings:
• If the destination is a file or an API endpoint, the Flow continues without an error.
• If the destination is a database and streaming or bind variables are disabled, the Flow continues without an error.
• If the destination is a database and streaming or bind variables are enabled, the Flow fails with an error.
To force the Flow to always generate an error when a mapped field does not exist in the source, open the Parameters tab and enable Error if Mapped Field is Missing in Source.
Handling Extra Fields in Source
This setting changes how source fields that are not in the Mapping are processed. Previously, any new columns appearing in the source, whether due to schema changes, evolving datasets, or unexpected data variations, were automatically included in the destination. This was the default behavior and how most users expected mapping to work.
With the introduction of Handling Extra Fields in Source, users now have control over how extra fields are handled. Instead of always including new fields by default, the flow can be configured to:
1. Ignore extra fields – Any columns in the source that are not explicitly mapped will be ignored and not included in the destination.
2. Error on extra fields – If extra columns appear in the source that are not part of the Mapping, the flow will fail before running any transformations, allowing early detection of unexpected schema changes.
This is a major change from the previous behavior. In the past, extra fields were always added automatically unless the “ALTER TABLE” feature was disabled for database destinations. Now, users have the ability to enforce stricter mapping rules and prevent unintended data propagation.
Key considerations:
- If the setting remains unchanged (default behavior), extra columns will continue to be processed automatically, as before.
- If “Ignore extra fields” is enabled, unmapped columns will be excluded from the transformation.
- If “Error on extra fields” is enabled, the flow will stop execution if unexpected columns are found in the source.
Named connections
A Flow inherits all connections from the source-to-destination transformations. However, you can also add additional named connections under the Connections tab.
Named connections can be used in:
• Mappings to dynamically resolve connections at runtime
• JavaScript transformations to perform lookups or enrich data from external sources. Example: calculate a column's value using a lookup.
Comments
0 comments
Please sign in to leave a comment.