Overview
This is the most commonly used Flow type. Almost all available connectors can be used as a source, as well as a destination.
When to use this Flow type
Etlworks extracts data from the source Connection, transforms it by applying one or multiple transformations, and then automatically loads it in the destination.
The following variations of this Flow
are available:
- The source is a relational database.
- The destination is a relational database.
- The source is a file, API endpoint, NoSQL database, or inbound email Connection.
- The destination is a file, API endpoint, NoSQL database, or outbound email Connection.
- The source is a well-known API.
- The destination is a well-known API.
- Any To Any ETL.
This transformation modifies the source data before loading it into the destination. Use copy, move, rename, delete, zip, and unzip files if all you need is to copy, move, rename or delete the files or perform the bulk operations with the data without modifying it.
The source is a relational database
When the source is a relational database, Etlworks extracts data from the database table, view, or synonym or executes a user-defined SQL query.
The user has two options:
- Use the
FROM
field to select or enter a database object name, such as a table, view, or synonym. In this case, Etlworks will automatically execute the SQL queryselect * from object
. Note that the fully qualified wildcard names are supported. Example:public.*
. In this case, the system will extract the data from all database objects which match the wildcard namepublic.*
and load it into the destination. - Enter the actual
SELECT
SQL statement in theSource query
field.
Source query is a script
When the source Connection is a relational database that supports stored procedures and anonymous SQL blocks, it is possible to run SQL scripts (with variables, conditions, loops, etc.) as a part of the Source query
. The only condition is that the script must return a cursor.
Enter the script in Source query
and enable the Execute as script
parameter.
Create Flow when the source is a relational database
To create a Flow, go to Flows
window, click +
, and type in database to
. Select one of the available Flows types and add one or multiple source-to-destination transformations.
Available transformations when the source is a relational database
- SQL
- Source-to-destination
- Per-field Mapping
- Validation
- Remove duplicates
- Scripting transformations
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the source is a relational database
- The Transformation name
- Linked Source Name
- Stream Data
- Parallel
- Ignore when there are no records
- Change replication
- Handling exceptions
The destination is a relational database
When loading data into a database, Etlworks either automatically generates and executes the SQL DML statement (INSERT
, UPDATE
, DELETE
, or MERGE
) or executes user-created SQL statements.
When the destination is a relational database, the user has two options:
- Use the
TO
field to select or enter the database object name (table, view, synonym). If the source (FROM
) is configured with a wildcard; you can use*
as a part of the destination name (TO
). The system will substitute the*
with the source name at run time. - Or enter the actual
INSERT
,UPDATE
, orDELETE
SQL statement in theDestination query
field.
The destination query can be more than just one SQL statement. Use ;
to separate SQL statements.
User-defined destination query disables bind variables, therefore it is much slower then automatically generated query. It is highly recommend to keep Destination query empty and let the flow to generate it automatically.
In the example below, 2 queries will be executed. Fields within the {
and }
brackets will be replaced with the actual field values given by the source.
Source query:
SELECT locust_staging.id,
locust_update_id.hotlineeventid,
recorded_time AS calldatetime,
CASE
WHEN date_completed IS NULL
THEN 2
ELSE 1
END AS eventstatusid,
FROM locust_staging
Destination query:
update hotlineevent
set calldatetime = {calldatetime},
where hotlineeventid = {hotlineeventid};
update locust_staging
set HotlineEventId = {hotlineeventid}
where id = {id};
Create Flow when the destination is a relational database
To create a Flow, go to the Flows
window, click +
, and type in to database
. Select one of the available Flows types and add one or multiple source-to-destination transformations.
Available transformations when the destination is a relational database
- SQL
- Source-to-destination
- Per-field mapping
- Filter (only if the source is not database)
- Validation
- Remove duplicates
- Scripting transformations
- Extract dimensions (only if the source is not database)
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the destination is a relational database
- The Transformation name
- Linked Source Name
- Stream Data
- Use Bind Variables
- Parallel
- Ignore when there are no records
- Use Native SQL
- Ignore when there are no records
- Create Tables and Adjust Fields
- Create Indexes
- Before SQL
- After SQL
- SQL Action
- Change replication
- Handling exceptions
The source is a file, API endpoint, NoSQL database, or inbound email Connection
The source can come in different forms. It can be either of the following:
- A flat, structured, or semi-structured file
- A response from the API endpoint
- A document in the NoSQL database
- An email body or attachment
Etlworks parses the source document and transforms it into the internal data structure, then converts it to the appropriate Format and sends it to the destination.
Here is an example of the Flow, which extracts data from the web service, transforms it, and loads into the database.
When creating the transformation, first you need to define the source by selecting a source (FROM
) Connection and a source (FROM
) Format. Once you do this, you can select or enter a source (FROM
) object name based on the source Connection type:
Source Connection Type |
Source object name |
---|---|
File storage (server or cloud) |
An actual file name or a wildcard file name. |
Web service |
Any appropriate name. For example, |
Outbound email Connection |
Either the name of the attached file or any appropriate name. |
A key-value storage (such as Redis) |
A |
A NoSQL database (such as MongoDB) |
Document id. |
Use SQL to transform data
Similar to when the source is a relational database, the user has an option to enter an SQL Source query, which will be executed after the data has been read from the source. For example, if the source is a CSV file, the user can write a query that aggregates the data into a file and creates a summary view. This view will be used to drive the destination data.
Read more about SQL transformation.
Work with wildcard file names
If your source data object is a file, it is possible to configure a source (FROM
) so that it will read the files by the wildcard name, for example, order_*.csv
.
Read how to work with wildcard file names and how to process all files by the wildcard.
Create Flow when the source is a file
To create a Flow, go to the Flows
window and click +
.
- If the source is a file, inbound email Connection, Redis, or MongoDB type in
file to
. - If the source is a web service type in
web service to
.
Select one of the available Flow types and add one or multiple source-to-destination transformations.
Available transformations when the source is a file
- SQL
- Source-to-destination
- Per-field Mapping
- Filter
- Order-by
- Validation
- Remove duplicates
- Scripting transformations
- Extract dimensions
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the source is a file
- The Transformation name
- Linked Source Name
- Stream Data
- Force Streaming
- Parallel
- Ignore when there are no records
- Ignore when there is no file
- Process all files
- Delete loaded sources files
- Delete source files on error
- Change replication
- Handling exceptions
The destination is a file, API endpoint, NoSQL database, or outbound email Connection
The source can come in different forms. It can either be any of the following:
- A flat, structured, or semi-structured file
- A payload for the API endpoint
- A document in the NoSQL database
- An email body or attachment.
Etlworks creates the destination document from the source using the appropriate data exchange Format and by applying transformations and Mapping.
When creating the transformation, first you need to define the destination by selecting a destination (TO
) Connection and a destination (TO
) Format. Once you do this, you can select or enter a destination (TO
) object name based on the destination Connection type:
Destination Connection Type |
Destination object name |
---|---|
File storage (server or cloud) |
An actual file name. Read how Etlworks calculates the destination file name. |
Web service |
Any appropriate name. For example, |
Outbound email Connection |
Either the name of the attached file or any appropriate name. |
A key-value storage (such as Redis) |
Any appropriate name. |
A NoSQL database (such as MongoDB) |
Document id. |
When the destination Connection is in file storage, the new file will be created each time the transformation is executed. If the file already exists, it will be overwritten.
Create Flow when the destination is a file
To create a Flow, go to the Flows
window and click +
.
- if the destination is a file, outbound email Connection, Redis, or MongoDB, type in
to file
. - if the destination is a web service, type in
to web service
.
Select one of the available Flow types and add one or multiple source-to-destination transformations.
Available transformations when the destination is a file
- SQL
- Source-to-destination
- Per-field Mapping
- Filter
- Order-by
- Validation
- Remove duplicates
- Scripting transformations
- Extract dimensions
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the destination is a file
- The Transformation name
- Linked Source Name
- Stream Data
- Force Streaming
- Parallel
- Ignore when there are no records
- Ignore when there is no file
- Process all files
- Change replication
- Handling exceptions
The source is a well-known API
A well-known API is a business application or API for which Etlworks has a native connector, for example, Google Sheets, Google Analytics, etc.
Here is an example of the Flow, which extracts data from the well-known API, transforms it, and loads into the database.
Create Flow when the source is a well-known API
To create a Flow, go to the Flows
window, click +
, select Well-known API to
and select the Flow, which extracts data from the well-known API.
Add one or multiple source-to-destination transformations.
Available transformations when the source is a well-known API
- SQL
- Source-to-destination
- Per-field Mapping
- Filter
- Order-by
- Validation
- Remove duplicates
- Scripting transformations
- Extract dimensions
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the source is a well-known API
- The Transformation name
- Linked Source Name
- Stream Data
- Force Streaming
- Parallel
- Ignore when there are no records
- Ignore when there is no file
- Process all files
- Change replication
- Handling exceptions
The destination is a well-known API
A well-known API is a business application or API for which Etlworks has a native connector, for example, Google Sheets.
Create Flow when the destination is a well-known API
To create a Flow, go to the Flows
window, click +
, select to Well-known API
and select the Flow which loads data into the well-known API.
Add one or multiple source-to-destination transformations.
Available transformations when the destination is a well-known API
- SQL
- Source-to-destination
- Per-field Mapping
- Filter
- Order-by
- Validation
- Remove duplicates
- Scripting transformations
- Extract dimensions
- Transpose
- Wrap columns
- Denormalize (pivot)
- Normalize as key-value pairs
- Denormalize key-value pairs
Available parameters when the destination is a well-known API
- The Transformation name
- Linked Source Name
- Stream Data
- Force Streaming
- Parallel
- Ignore when there are no records
- Ignore when there is no file
- Process all files
- Change replication
- Handling exceptions
Common flow parameters
Flow variables
To add user-defined flow variables open Parameters
tab, add variables as key-value pairs. Use UPPERCASEd characters and no spaces for variable names.
Maximum number of parallel threads
It is possible to execute multiple source-to-destination transformations (extract-load) in parallel threads.
By default, the Etlworks limits the number of threads to 5. To change the limit open Parameters
tab, set the max number of threads in Maximum Number of Parallel Threads
. This value cannot be greater than 99.
It is possible to configure the ETL Flow which has multiple source-to-destination transformations or a single transformation with a wildcard source to retry failed or not yet executed steps (transformations) on restart.
To enable retry of the failed transformation open Parameters
tab and check the Retry Failed Transfromations
flag.
Read more about retrying failed transformations.
By default, if the source field in mapping does not actually exist in the source the flow behaves differently, depending on the type of the destination connection and whether the steaming and use of bind variables are enabled.
Specifically:
- If the destination is a file or API endpoint the flow will continue without the error.
- If the destination is a database and the streaming or use of bind variables are disabled the flow will continue without the error.
- If the destination is a database and the streaming and use of bind variables are enabled the flow will generate an error.
If you want the flow to consistently generate an error when the field in mapping does not exist in the source open Parameters
tab and check the Error if field in mapping is not in the source
flag.
Named connections
Flow inherits all the Connections from the source-to-destination transformations. You can add additional named Connections under theConnections
tab.
Use named Connections
You can then use the named Connections in mapping and JavaScript transformations. For example, you can use a named connection to calculate a column's value using a lookup.
Comments
0 comments
Please sign in to leave a comment.