ETL with relational databases – Etlworks Support

Etlworks reads from one relational database and writes into another. This is the most common database integration pattern.

How does a database-to-database flow work?

You create a source database connection, a destination database connection, and a flow with one or more source-to-destination transformations. Each transformation maps a source table or query to a destination table. The flow controls how rows are written (INSERT, UPDATE, MERGE, etc.) and how the destination schema is created or modified.

How do I set this up?

Create a source database connection.
Create a destination database connection.
In Flows, click + and select Database to database from the Gallery.
Add a source-to-destination transformation.
Configure the source side: see when the source is a database.
Configure the destination side: see when the destination is a database.
Pick an SQL action: INSERT, UPDATE, DELETE, MERGE, IFEXIST, or CUSTOM.
To extract only changed rows, configure high watermark change replication or another change replication technique.

To move many tables at once, see how to move tables matching a wildcard name from one schema or database to another.

How do I handle auto-generated destination columns (identity, sequence, etc.)?

Exclude the auto-generated column from the mapping so Etlworks doesn't try to write to it.

exclude auto-generated column

How do I insert images or other binary data from files into a database?

This example loads image files into a SQL Server image table. The pattern works for any database that supports loading a binary file in SQL.

The target table:

CREATE TABLE image (
    Id INT IDENTITY(1,1) PRIMARY KEY,
    FileName VARCHAR(255) NOT NULL,
    Image VARBINARY(max) NOT NULL
);

The folder containing the images must be accessible by the target database server.

Create a database connection to the target database.
Create a Server Storage connection pointing at the folder with the images. Call it Local images.

Create a JavaScript flow that builds the list of files. The URL passed to FileManagerTask.list must match the URL in step 2.

var javaImports = new JavaImporter(
    com.toolsverse.etl.core.task.common,
    com.toolsverse.util);

with (javaImports) {
    var alias = new Alias();
    alias.setUrl('the same URL as in step 2');

    etlConfig.setValue('image files to load',
        FileManagerTask.list(alias, '*.*'));
}

Create an Execute SQL flow against the database connection from step 1. The token {FILE_TO_LOAD} is replaced at runtime:
```
INSERT INTO Image (FileName, Image)
VALUES (
    '{FILE_TO_LOAD}',
    (SELECT * FROM OPENROWSET(BULK N'{FILE_TO_LOAD}', SINGLE_BLOB) AS Image)
);
```
Note: This is SQL Server syntax. Other databases have their own binary-file loading syntax — check the target database's documentation.
Create a Delete files flow using the connection from step 2. Set FROM to {FILE_TO_DELETE}.
Combine steps 4 and 5 (in that order) into a nested flow.

Create a final nested flow:

Add the JavaScript flow from step 3.
Add the nested flow from step 6. Click the Edit condition (pen) icon and paste this loop script. It pops one file from the list, sets the tokens, and continues until the list is empty:

var javaImports = new JavaImporter(
    com.toolsverse.etl.core.task.common,
    com.toolsverse.util);

with (javaImports) {
    var files = etlConfig.getValue('image files to load');

    if (!files.isEmpty()) {
        var variable = scenario.getExecute().get(0)
            .getVariable('FILE_TO_LOAD');

        if (variable == null) {
            variable = new Variable();
            variable.setName('FILE_TO_LOAD');
            scenario.getExecute().get(0).addVariable(variable);
        }

        variable.setValue(files.get(0).getPath());
        SystemConfig.instance().getProperties()
            .put('FILE_TO_DELETE', files.get(0).getName());

        files.remove(0);
    }

    value = files.isEmpty() ? null : 'continue';
}

Articles in this section

How does a database-to-database flow work?

How do I set this up?

How do I handle auto-generated destination columns (identity, sequence, etc.)?

How do I insert images or other binary data from files into a database?

Related articles