There is a quite common use case when the source dataset is "denormalized" and contains information about multiple related entities, for example, orders and items. The example below demonstrates how to "normalize" the dataset by loading data from the single dataset in multiple linked databases tables.
Let's assume there is a "flat" dataset, like the one below:
As you can see, the values in the first 6 columns are repeated for multiple rows, grouped by the first 3 and the last 2 rows. The values in last 2 columns are unique. This is because the table above was created by transforming a nested dataset into a denormalized, "flat" dataset.
Now, let's assume that we want to read the dataset above and load it into 2 database tables instead of just one:
Basically, DocumentXRef is the primary key in the Order table and a foreign key in the Item table. The relationship between Order and Item is one-to-many (one order matches many items).
If our example has Item.ItemId as an auto-generated primary key in the Item table, then we don't have to worry about generating value for Item.ItemId - the database will take care of it automatically.
The technique we are going to use in this example is fairly simple: read the same source dataset (file) twice and use it to insert data into two tables. We will be creating two transformations with the same source and different destinations (two tables). The advantage of this technique is that it does not require any programming.
Step 1. Create a source connection and format which will be used to read data from the source dataset.
Step 2. Create a destination which is a database connection.
Step 3. Start creating a flow in the Flow Builder window by clicking the
+ button and selecting Databases/Extract data from files, load into database (or something similar, as long as a source is a data object and the destination is a database).
Step 4. Let's create the first transformation, which inserts data into the Order table. Start by selecting a source (FROM) connection and format, and destination (TO) connection. Continue by entering a filename into the FROM field, for example, test.xml and entering
order into the TO field.
Step 5. Click the
MAPPING button and create a mapping which only includes columns from the table Order and excludes columns from the table Item:
Step 6. Select the Additional Transformations tab, scroll down to the Remove Duplicates field and enter
DocumentXRef in the Remove Duplicates field you activate a transformation which excludes all records with the same value in the DocumentXRef, except for the first occurrence.
Step 7. Uncheck Parallel and save the transformation.
Step 8. Continue by adding a second transformation, which inserts data into the Item table. Use the same FROM and TO connection, format, and filename as in Step 4. Enter
item into the TO field.
Step 9. Click the
MAPPING button and only include columns from the table Item while excluding columns from the table Order. Only the column DocumentXRef exists in both:
Step 10. Select the Parameters tab, uncheck Parallel and save the transformation.
Step 11. Save the flow.