This is the most common transformation. Integrator extracts data from the source and loads it into the destination.
To add new source-to-destination transformation select the flow type which starts with source to target (for example, web service to database) and click the
A typical transformation includes a source (FROM), and a destination (TO). The source and destination can be any data source supported by Integrator: tables in a relational database, SQL statements, files, web services, API endpoints, or social media websites.
Read more about mapping a source to a destination.
The other required attributes are the source name and the destination name.
- if the source or destination is a relational database, the name will be the name of the table, view, a synonym or the name of the SQL statement.
- if the source or destination is in file storage (server or cloud), the name is an actual file name or a wildcard file name.
- if the source or destination is a web service, the name is any appropriate name. For example, patient.json.
- if the source or destination is an email connection, the name is either the name of the attached file or any appropriate name.
- if the source or destination is a key-value storage, such as Redis, the name is a key name.
- if the source or destination is a NoSQL database, such as MongoDB, the name is a document id.
One flow can include multiple transformations. Transformations are typically executed in order but can also be run out-of-order, if parallel execution is enabled.
Selecting FROM and TO objects from the list.
If the source or/and destination is connected to the live connection the system automatically populates available objects (tables, files, endpoints) from the specific connection. Use metadata selector to select an object from the list.
You can add new transformations, also transformations can be reordered, disabled and deleted. Disabled transformations will not be executed.
Most of the transformations include per-field mapping and additional parameters.
In Integrator, you can customize your
Source-to-Destination transformation by entering parameters.
To open the Parameters section while still on a MAPPING screen, select one of the following tabs:
On Exception, and
The actual list of the available parameters and high-level transformations is dynamic and depends on the type of flow, and source and destination connections.
The Transformation name
Each source-to-destination transformation has either an automatically-generated or user-assigned name. In most cases, the name is generated automatically using the following algorithm:
" TO " +
INDEX. For example, if the FROM (source) is
employee.json and the TO (destination) is
employee.csv, and the index of the transformation is
1, the transformation name will be
employee.json TO employee.csv 1.
If you override the transformation name under the
mapping->parameters this name will be used instead:
Linked Source Name
By default, Integrator creates pairs of sources and destinations (From-To). Each source is linked to a single destination. It is possible to link a single source to multiple destinations by specifying this property. For example, let's assume that there is a source (From)
employee.json. You can link it to multiple destinations (To) by creating pairs of From-To transformations and setting the Linked Source Name to
employee.json for each of these pairs. The From part of each pair will be ignored and set to
Parameters used to improve performance
When this option is enabled (by default), Integrator will stream data from the source to the destination. When streaming is enabled, Integrator never reads the whole dataset (table, view, etc.) into memory. Instead it reads data from the source in chunks and immediately writes data to the destination. In general, streaming is much faster and saves a lot of memory, so it is enabled by default. However, streaming is not always possible. Integrator will make every attempt to detect situations where streaming won't work and will automatically disable it. In rare cases, when Integrator cannot automatically disable streaming, you might need to do this manually, hence the streaming option.
Use Bind Variables
The fastest way to execute SQL queries is to use bind variables. When multiple, identical SQL statements are executed in batches and the only difference between them is the actual field value for each statement, it saves a lot of runtime, due to the fact that the SQL statement will be prepared (compiled) only once. Using bind variables also automatically enables streaming and batch processing. It is much faster when SQL statements are processed in batches. Integrator automatically sets the batch window to 100 but is also possible to set the batch window manually using Batch Window field. The enabling of bind variables (enabled by default) is subject to the same restrictions as the enabling of streaming, so the same rules are applied.
Our ETL engine is written in Java and uses JDBC protocol to connect to the databases. In JDBC, the setFetchSize(int) method is very important to performance and memory-management within the JVM as it controls the number of network calls from the JVM to the database and correspondingly the amount of RAM used for ResultSet processing. Many JDBC drivers (for example Oracle) by default enforce the fetch size, some others (for example MySQL and Redshift) - do not. The Fetch Size field can be used to set the maximum number of records which will be retrieved in one database call, thus limiting the memory consumption within the JVM.
Setting Fetch Size at the transformation level overrides Fetch Size set the connection level.
By default, our ETL engine extracts all the records from the source, limited only by the source query or filter conditions. You can use this field to set the upper limit on the total number of records which can be extracted from the source.
Setting Max Records at the transformation level overrides Max Records set the connection level.
When this option is enabled (the default), the particular transformation will be executed in a parallel thread. From a practical standpoint, this means that Integrator can run multiple transformations in parallel, which is generally much faster. However, when there are dependencies between transformations (for example, when the destination table "items" depends on the destination table "order") you might want to disable parallel execution.
In many cases, when the source is a file or a response from the web service, and the destination is a database table, the streaming (and an ability to use bind variables) is getting automatically disabled by the engine, even when it is enabled for the specific transformation. For example, if the the source is XML or JSON document, or if the SQL queries used to filter out records in the response from the web service call.
In Integrator it is possible to force streaming, even when it is not allowed, by enabling the option Force Streaming. When streaming is forced, the system automatically creates a streamable view from the not streamable source. It still keeps all the records in memory but does not keep all the SQL statements in memory and it is possible to use bind variables with a streamable view.
Use Native SQL
If this option is enabled (disabled by default), the native SQL used for the destination connection SQL dialect (for example, PL/SQL for Oracle) will be used. It is generally slower to use native SQL because it automatically disables the ability to use bind variables.
Parameters which control whether the transformation will be executed
Ignore when there are no records
If this parameter is enabled and the source dataset has no records the entire transformation will be ignored, for example, the file will not be loaded into the database, the API call will not be made, etc.
Ignore when there is no file
If this parameter is enabled and there are no source files the entire transformation will be ignored, for example, the file will not be loaded into database, the API call will not be made, etc.
Parameters available only when the source is a file
Process all files
If this parameter is enabled and there are multiple files matching a wild-card file name - all matching files will be processed (extracted and loaded).
Parameters available only when the destination is a database
If this parameter is enabled (the default) and the destination table does not exist, Integrator will automatically create it. Also, when this option is enabled, the insert, update and merge statements will only include fields which exist in the destination table, so it is recommended that you keep it enabled.
If this parameter is enabled (disabled by default) and the source table has indexes - the same indexes will be created for the destination table.
The SQL to be executed on the destination connection before the extract.
The SQL to be executed on the destination connection after the load.
SQL action (available only if the destination is database)
By default, Integrator INSERTS records into the destination table. You can change this to
MERGE is natively supported for Oracle, MS SQL Server, PostgreSQL, MySQL, DB2, Informix and Sybase databases. The alternative to MERGE is the
IfExist action: Integrator will check to see if the record already exists, and if it does - will execute
UPDATE; otherwise it executes
INSERT. The Action can be
Allows you to enter a comma-separated list of the fields which will be used to uniquely identify a record, for the
if the action is
MERGEand the database is PostgreSQL or MySQL, there must be a unique index which includes all "lookup fields". For other databases, it is optional but nice to have, enhancing performance.
MERGE) should be used. Read how to configure a conditional action.
In Integrator, it is possible to create a flow that tracks changes in a data source, for example database or API, and loads the changed records into the destination, for example another database or API. Another name for change replication is data synchronization.
Integrator supports unidirectional change replication: from the source to the destination. To replicate changes in the opposite direction--from the destination to the source--a separate flow must be created, where the source and the destination swap places.
Read more about configuring a change replication in Integrator.
High Watermark Field
A field used as a high watermark for the change replication.
Enable Change Replication
Enable or disable change replication. This works together with High Watermark Field.
High Watermark Field Value on First Run
The optional value of the high watermark field when running change replication scenario first time. If the high watermark field is a TIMESTAMP use <code>YYYY-MM-DD HH:mm:SS.mil</code> format, for example <code>2018-08-01 19:38:31.377</code>. The value set for this field will be used only when the change replication flow is running the first time and will be ignored on the next run
High Watermark Field Value
To make a change replication flow bulletproof you can specify a SQL query which will be calculating the high watermark field value on the fly, based on the current state of the destination table. Basically, it will be pulling only the records in the source which do not exist in the destination or have changed in the source since the last run of the flow.
The parameters in this group can be found under the tab
By default, any error causes a halt of execution. When
ignore is selected, the errors will be ignored and execution will continue. Read how to configure the flow to ignore all or specific exceptions.
you can specify what errors should be ignored and still halt the execution for all other errors. Enter part or all of the exception string. This field works only when the Ignore On Exception option is selected.
Rollback on Exception
Some databases (for example, PostgreSQL) require transaction rollback for all exceptions.