Overview
In Etlworks, it is possible to read data from one dataset and load it into multiple database tables.
Use Case
There is a common use case when the source dataset is denormalized and contains information about multiple related entities, for example, orders and items. The example below demonstrates how to normalize the dataset by loading data from the single dataset in multiple linked databases table
Let's assume there is a flat dataset, like the one below:
As you can see, the values in the first 6 columns are repeated for multiple rows, grouped by the first 3 and the last 2 rows. The values in the last 2 columns are unique. This is because the table above was created by transforming a nested dataset into a denormalized, flat dataset.
Now, let's assume that we want to read the dataset above and load it into 2 database tables instead of just one:
Basically, DocumentXRef
is the primary key in the Order
table and a foreign key in the Item
table. The relationship between Order
and Item
is one-to-many (one order matches many items).
If our example has Item.ItemId
as an auto-generated primary key in the Item
table, then we don't have to worry about generating value for Item.ItemId
. The database will take care of it automatically.
Process
The technique we will use in the example above is fairly simple: read the same source dataset (file) twice and use it to insert data into two tables.
We will be creating two transformations with the same source and different destinations (two tables). The advantage of this technique is that it does not require any programming.
Here's a detailed instruction on how to do this:
Step 1. Create a source Connection and Format, which will be used to read data from the source dataset.
Step 2. Create a destination which is a database Connection.
Step 3. Start creating a Flow in the Flows
window by clicking +
, and selecting File to database
(or something similar, as long as a source is a data object and the destination is a database).
Step 4. Create the first transformation, which inserts data into the Order
table. Start by selecting a source (FROM
) Connection and Format, and destination (TO
) Connection. Continue by entering a filename into the FROM
field, for example, test.xml
, and entering order
into the TO
field.
Step 5. Click MAPPING
and create a Mapping that only includes columns from the table Order
and excludes columns from the table Item
.
Step 6. Select the Additional Transformations
tab, scroll down to the Remove Duplicates
field, and enter DocumentXRef
.
By entering DocumentXRef
in the Remove Duplicates
field, you activate a transformation that excludes all records with the same value in the DocumentXRef
, except for the first occurrence.
Step 7. Uncheck Parallel
and save the transformation.
Step 8. Continue by adding a second transformation, which inserts data into the Item
table. Use the same FROM
and TO
Connection, Format, and filename as in Step 4. Enter item
into the TO
field.
Step 9. Click MAPPING
and only include columns from the table Item
while excluding columns from the table Order
. Only the column DocumentXRef
exists in both.
Step 10. Select the Parameters
tab, uncheck Parallel
, and save the transformation.
Step 11. Save the Flow.
Comments
0 comments
Please sign in to leave a comment.