In the Etlworks Integrator, it is possible to read data from one dataset and load it into multiple database tables.
There is a common use case when the source dataset is denormalized and contains information about multiple related entities, for example, orders and items. The example below demonstrates how to normalize the dataset by loading data from the single dataset in multiple linked databases table
Let's assume there is a flat dataset, like the one below:
As you can see, the values in the first 6 columns are repeated for multiple rows, grouped by the first 3 and the last 2 rows. The values in the last 2 columns are unique. This is because the table above was created by transforming a nested dataset into a denormalized, flat dataset.
Now, let's assume that we want to read the dataset above and load it into 2 database tables instead of just one:
DocumentXRef is the primary key in the
Order table and a foreign key in the
Item table. The relationship between
Item is one-to-many (one order matches many items).
If our example has
Item.ItemId as an auto-generated primary key in the
Item table, then we don't have to worry about generating value for
Item.ItemId. The database will take care of it automatically.
The technique we will use in the example above is fairly simple: read the same source dataset (file) twice and use it to insert data into two tables.
We will be creating two transformations with the same source and different destinations (two tables). The advantage of this technique is that it does not require any programming.
Here's a detailed instruction on how to do this:
Step 1. Create a source Connection and Format, which will be used to read data from the source dataset.
Step 2. Create a destination which is a database Connection.
Step 3. Start creating a Flow in the
Flows window by clicking
+, and selecting
File to database (or something similar, as long as a source is a data object and the destination is a database).
Step 4. Create the first transformation, which inserts data into the
Order table. Start by selecting a source (
FROM) Connection and Format, and destination (
TO) Connection. Continue by entering a filename into the
FROM field, for example,
test.xml, and entering
order into the
Step 5. Click
MAPPING and create a Mapping that only includes columns from the table
Order and excludes columns from the table
Step 6. Select the
Additional Transformations tab, scroll down to the
Remove Duplicates field, and enter
DocumentXRef in the
Remove Duplicates field, you activate a transformation that excludes all records with the same value in the
DocumentXRef, except for the first occurrence.
Step 7. Uncheck
Parallel and save the transformation.
Step 8. Continue by adding a second transformation, which inserts data into the
Item table. Use the same
TO Connection, Format, and filename as in Step 4. Enter
item into the
Step 9. Click
MAPPING and only include columns from the table
Item while excluding columns from the table
Order. Only the column
DocumentXRef exists in both.
Step 10. Select the
Parameters tab, uncheck
Parallel, and save the transformation.
Step 11. Save the Flow.