Overview
Use drag&drop mapping editor to create source-to-destination transformations where the source and (or) the destination are nested documents. These are some of the most common use cases:
- Transform nested documents from one format (for example JSON) to another (for example XML).
- Change the structure of the nested document.
- Transform nested document into a flat array.
- Transform the flat array into a nested document.
- Denormalize nested document by creating an array of elements from the deepest level and adding elements from the parent level. Example:
<orders><order><items><item>
to<items><item><order>
.
The Mapping editor should be able to handle most of the scenarios. Still, in some extreme cases, when the destination document is so complex that using the drag-and-drop editor is not the best option, you can create nested documents with JavaScript. Read more.
Difference between nested and flat mapping
Nested mapping is a transformation executed on a source document before loading the data into the destination and right after the extraction. The flat mapping is executed when the flow loads data into the destination.
Flat mapping has limited capabilities compared to nested mapping and supports only two use cases when working with nested documents, but it can be faster than nested.
Flat mapping uses the default mapping concept: if the field is not in the mapping, it is automatically added to the destination as is. Nested mapping requires all destination fields to be added to the list.
Nested mapping enforces the order of fields defined in the mapping editor. Flat mapping adds new fields (fields which don't exist in the source) at the end of the record.
The flat mapping works much faster when both the source and the destination are flat. When nested mapping is used and the destination is a database table it is highly recommended to enable force streaming.
Create nested mapping
When Etlworks reads a nested data object (JSON, XML, etc.), it parses it and displays the fields in a tree-like structure (see the screenshot below).
The initial mapping is created by clicking the Create Mapping
button. For nested mapping, select Create with nested destination
instead of Create
. If you made a flat mapping, you can later enforce the nested mapping by selecting the Enforce nested mapping
checkbox.
Use buttons Make this field a child of the one above
and Move child field up one level
on the right side to control the level of nestedness for a specific field in the destination document. You can also move fields up and down.
Click +
button to add new fields to the mapping. Read about other controls.
Enforce nested mapping
Nested mapping is a transformation. It can be enforced even if the destination is not nested. Here are some use cases when you want to enforce the nested mapping.
- Transform nested document into a flat array.
- Denormalize nested document.
- Preserve the order of columns defined in the mapping editor.
To enforce the nested mapping, select the Enforce nested mapping
checkbox.
The nested mapping is always enforced when the destination is nested.
Test nested mapping
To test the nested mapping, click the Test Transformation
button at the bottom of the mapping screen.
It will open the Test Results
tab with the generated document in the destination format.
Use cases
Transform nested documents from one format to another
Data exchange formats that support nested documents
Example
In this example, we transform the source JSON document into destination XML. We add a new field itemsCount
, which calculates the number of items in each order. We also configure itemId
as an XML attribute and itemsCount
as a node with CDATA section.
Source JSON document:
{
"orders": [
{
"orderId": "1",
"orderDate": "01/01/2024",
"items": [
{
"itemId": "1-1",
"name": "item1-1",
"price": 123,
"vendor": {
"name": "vendor1",
"address": "1 Main st"
}
}]
}]
}
Expected destination XML:
<?xml version='1.0' encoding='UTF-8'?>
<orders>
<order>
<orderId>1</orderId>
<orderDate>01/01/2024</orderDate>
<items>
<item id="1-1">
<name>item1-1</name>
<price>123</price>
<vendor>
<name>vendor1</name>
<address>1 Main st</address>
</vendor>
</item>
<itemsCount><![CDATA[1]]></itemsCount>
</order>
</orders>
Source-to-destination transformation:
Mapping:
Parent|row in XML
Notice mapping for orders
and items
columns in the screenshot above:
orders
->orders|order
items
->items|item
XML, unlike JSON, typically requires an enclosing parent tag for an array of nodes. By default, the transformation will use the same name for the parent tag in the array and the row node:
<orders>
<orders>
<orders>
<orders>
To give a unique name for the row tag, set the destination in mapping to parent|row
, for example, orders|order
. It will generate the following XML:
<orders>
<order>
<order>
<orders>
Data type modifiers
The data type modifications are used to change how the destination document is rendered.
To access the modifiers, click the data type icon in the mapping row.
Data type modifiers when the destination is XML
The following modifies are available when the destination is XML:
XML attribute
: This field is an XML attribute.Example: <tag attr="value">
XML CDATA section
: This field is an XML CDATA section. Example:<tag><![CDATA[text]]></tag>
XML tag with value and attributes
: This field is an XML tag with value and attributes. Example:<tag ContentID="11-22-33-44" anotherattr="attr value">some value</tag>
. Enable this modifier for the "value" field. EnableXML attribute
for "attribute" fields.XML hide repeating element root tag
: This repeating field root tag will not be added to the nested XML. Example:<row>value</row>
instead of<owner><row>value</row></owner>
XML hide row tag
: This repeating field wrapping <row> tag will not be added to the nested XML. Example:<images><image>url</image></images>
instead of<images><row><image>url</image></row></images>
Data type modifiers when the destination is JSON
The following modifies are available when the destination is JSON:
This field is a JSON array of strings
: if you need to create a JSON array of strings, for example,"node_name":["value1","value2,"value3"]
set the field data type toARRAY
:
JSON array
: When creating JSON documents, the transformation "decides" whether to create JSON arrays based on the number of elements in the nested dataset. Arrays are used if the number of rows is more than 1. Enable this modifier to enforce the creation of arrays regardless of the number of elements in the nested dataset.JSON array values are not strings
: This field is a JSON array with literal values (not strings). Example:[1,2,3]
instead of["1","2","3"]
Change the data type of the value
: to change the data type of the value in"node": value
select theData Type
from the dropdown.
Change the structure of the nested document
You can change the structure of the destination document by adding, hiding, renaming, and moving (up and down) the fields. Use buttons Make this field a child of the one above
and Move child field up one level
on the right side to control the level of nestedness for a specific field in the destination document.
Source JSON document
{
"orders": [
{
"orderId": "1",
"orderDate": "01/01/2024",
"items": [
{
"itemId": "1-1",
"name": "item1-1",
"price": 123,
"vendor": {
"name": "vendor1",
"address": "1 Main st"
}
}]
}
}
Expected destination JSON document
{
"metadata": {
"timestamp": "01/01/2024 00:00:00",
"count": 1
},
"data": [
{
"orderId": "1",
"orderDate": "01/01/2024",
"items": [
{
"itemId": "1-1",
"name": "item1-1",
"price": 123
}]
}
}
Source-to-destination transformation
Mapping
Transform nested document into a flat array
You can transform the nested source document into a flat array by reducing the nestedness of the fields on the right (destination) and hiding the parent fields.
Source JSON document
{
"orders": [
{
"orderId": "1",
"orderDate": "01/01/2024",
"items": [
{
"itemId": "1-1",
"name": "item1-1",
"price": 123,
"vendor": {
"name": "vendor1",
"address": "1 Main st"
}
}]
}
}
Expected destination JSON document
[
{
"orderId": "1",
"orderDate": "01/01/2024",
"items": [
{
"itemId": "1-1",
"name": "item1-1",
"price": "123"
}
]
Source-to-destination transformation
Mapping
Transform the flat array into a nested document
You can transform the flat source document into a nested one by increasing the nestedness of the fields on the right (destination) and adding the parent fields.
Source JSON document
[
{
"orderId": "1",
"orderDate": "01/01/2024",
"items": [
{
"itemId": "1-1",
"name": "item1-1",
"price": 123,
"vendor": {
"name": "vendor1",
"address": "1 Main st"
}
}]
}
}
Expected destination JSON document
{
"data": [
{
"orderId": "1",
"orderDate": "01/01/2024",
"items": [
{
"itemId": "1-1",
"name": "item1-1",
"price": "123"
}]
}]
}
Source-to-destination transformation
Mapping
Denormalize nested document
You can flatten and denormalize the nested source document by extracting data from the deepest level and combining it with the elements of the parent node.
In this example, a source JSON document includes an array of orders, each having its own array of items. The goal is to create a destination JSON document with an array of items where each item has some of the attributes of the parent order.
It is accomplished by setting the level of nestedness of the parent field (order) to the same as that of the deepest field in the nested document (item) and hiding the top-level parent fields (orders and items).
Source JSON document
{
"orders": [
{
"orderId": "1",
"orderDate": "01/01/2024",
"items": [
{
"itemId": "1-1",
"name": "item1-1",
"price": 123
},
{
"itemId": "1-2",
"name": "item1-2",
"price": 234
}]
},
{
"orderId": "2",
"orderDate": "01/01/2024",
"items": [
{
"itemId": "2-1",
"name": "item2-1",
"price": 123
},
{
"itemId": "2-2",
"name": "item2-2",
"price": 234,
}]
}]
}
Expected destination JSON document
[
{
"orderId": "1",
"orderDate": "01/01/2024",
"id": "1-1",
"name": "item1-1",
"price": "123"
},
{
"orderId": "1",
"orderDate": "01/01/2024",
"id": "1-2",
"name": "item1-2",
"price": "234"
},
{
"orderId": "1",
"orderDate": "01/01/2024",
"id": "1-3",
"name": "item1-3",
"price": "456"
},
{
"orderId": "2",
"orderDate": "01/01/2024",
"id": "2-1",
"name": "item2-1",
"price": "123"
},
{
"orderId": "2",
"orderDate": "01/01/2024",
"id": "2-2",
"name": "item2-2",
"price": "234"
}
]
Source-to-destination transformation
Mapping
Calculate field value
By default, a DESTINATION
field gets its value from its corresponding SOURCE
field. The value of any field can be changed or calculated using JavaScript or Python.
Select or enter the DESTINATION
field, click the pen
icon to the right, and enter a program in the scripting language to calculate the field's value.
Get a field's value
dataSet.getFieldValue(currentRow, "field name") // any field by name
or
fieldValue // current field in the context
Set a field's value
value = calculated_field_value // current field in the context
or
dataSet.setFieldValue(currentRow, "field name", calculated_field_value) // any field by name
Available objects
The following objects can be referenced by name from JavaScript or Python code:
Object name | Class name / JavaDoc | Description |
---|---|---|
dataSet | com.toolsverse.etl.common.DataSet | current destination dataSet for that specific field |
currentRow | com.toolsverse.etl.common.DataSetRecord | current record in the current source dataSet for that specific field |
etlConfig | com.toolsverse.etl.core.config.EtlConfig | ETL config |
scenario | com.toolsverse.etl.core.engine.Scenario | ETL scenario |
sourceDataSet | com.toolsverse.etl.common.DataSet | the source dataSet for that specific field |
actualSourceDataSet | com.toolsverse.etl.common.DataSet | the original source dataSet |
fieldValue | the value of the field in the source dataSet | |
row | current 0-based row number |
Streaming is disabled when nested mapping is used
Etlworks can read and write nested documents in JSON, XML, JSON, Parquet, Avro, EDI, and other supported formats.
By default, it attempts to use streaming to load data into the destination. Streaming is faster and has a very low memory footprint. Unfortunately, streaming cannot be used when reading and writing complex nested data objects, so Etlworks automatically disables it.
If the source is a nested dataset and the destination is a database, it is highly recommended to enable the Force Streaming option. When streaming is forced, the system automatically creates a stream-able view from a non-stream-able source. It still keeps all the records in memory but does not keep all the SQL statements in memory, and it is possible to use bind variables with a stream-able view.
Comments
0 comments
Please sign in to leave a comment.