Overview
This technique can be used to create documents in any supported Format. XML and JSON are the most commonly used data exchange Formats.
It is possible to use Mapping Editor to create complex nested documents without writing a single line of code. The Mapping editor can also be used to flatten complex nested documents. Below, we will look at some extreme cases where the destination document is so complex that using the drag-and-drop editor is not the best option.
When Etlworks parses the source file, it creates an internal data structure called a DataSet. Its methods can be used to create a new dataset, which will then be used by the flow to automatically create a destination file. Here's a step-by-step instruction on how to do this:
Step 1. Create a Flow where the destination is a file or API.
Step 2. Select Mapping
> Parameters
and disable Streaming
.
Step 3. Enter the transformational JavaScript code into the After Extract
field.
Code template
Copy and paste this template to start writing a transformation:
importPackage(com.toolsverse.etl.common);
// create a staging dataset
var root = new DataSet();
// create fields and data in the staging dataset
...
// update the dataset, which will be used to create a final documemnt,
// by copying data and fields from the staging dataset
dataSet.setFields(root.getFields());
dataSet.setData(root.getData());
Techniques
The following techniques can be used to create a data set that will be automatically serialized as a nested XML or JSON document:
- Understand the DataSet
- Add field
- Renaming Field
- Hiding Field
- Using SQL
- Using Lookup
- Using UNION
- Using INTERSECT and MINUS
- Using JOIN and other transformations
- Creating complex XML documents
- Creating complex JSON documents
Understand the DataSet
A DataSet is similar to a database table. It has fields and the rows are stored in DataSetData. Each value in the row can be either a generic value (integer, string, date, etc.) or a DataSet by itself. This allows datasets to have an infinitely complex structure, supporting all types of real-life use cases.
Traverse the original dataset
When creating a staging dataset, you will be traversing the original dataset, which was populated during the extract, then updating fields, and data in the staging dataset.
Read about the DataSet methods, available when traversing the original dataset, and updating the staging dataset.
An example of traversing the original, and updating the staging dataset:
for (row = 0; row < allGenres.getRecordCount(); row++) {
genre.addField("genre", allGenres.getFieldValue(row, 2));
}
Add Field
It is possible to add a new field (with the value) to the dataSet
. If a field with the given name already exists in the given dataSet
, the new record will be added.
dataSet.addField("field name", fieldValue);
Add nested field
data.addField("owner.field", fieldValue);
Example:
staging.addField("patient.address", "123 Main st");
Add field with specific data type
data.addField("field name", fieldValue, Type, nativeType);
Example:
staging.addField("firstName", "Joe", java.sql.Types.VARCHAR, "VARCHAR(255)");
Rename field
It is possible to programmatically rename the field in the document.
Original XML
<SourceID>123456</SourceID>
New XML
<ID>123456</ID>
To rename the field:
// get the reference to the field. Use actual data set
var fld = dataSet.getFieldDef("SourceID");
/* set the name of the field and the real name of the field
as it should appear in the created document. */
fld.setName("ID");
fld.setRealName("ID");
Hide field
It is possible to programmatically hide the field in the document.
Original XML
<ID name="abc" value="xyz">123456</ID>
New XML
<ID value="xyz">123456</ID>
To hide the field in the document:
// get the reference to the field. Use actual data set and field name
var fld = dataSet.getFieldDef("abc");
// hide the field
fld.setVisible(false);
Use SQL
SQL Lookup is a powerful technique that can be used to extract data from the original dataset, even multiple levels deep inside a nested structure.
Read how to write SQL queries, which can be executed on any dataset.
Use CommonEtlUtils.executeSql(dataSet, sql)
to execute SQL on any dataset and return another dataset.
The example below demonstrates how to use SQL lookup to extract deeply-nested data from the original dataset:
importPackage(com.toolsverse.etl.common);
var allGenres = CommonEtlUtils.executeSql(dataSet,
"select localeCode, *
from versions.metaData.genre.values.SMAT.value where localeCode = 'US'");
Notice that in the code from versions.metaData.genre.values.SMAT.value
, each group separated by a .
, represents another dimension in the nested dataset.
Use Lookup
When the data needed to update a staging dataset is located somewhere else, not in the original dataset, you can use lookup. In addition to SQL, lookup requires a Connection name, and a data object name (for example, a database table name, or a file name).
Use UNION
UNION
is available when using SQL, with one restriction: while the keyword UNION
is supported, Etlworks in fact always executes UNION ALL
.
To perform UNION
from the JavaScript:
importPackage(com.toolsverse.etl.common);
var people = CommonEtlUtils.executeSql(dataSet,
'select languageCode,displayName as "Name", "actor" as "Job"
from versions.metaData.castAndCrew.actors.value
where languageCode = "ENG"
union select languageCode,displayName as "Name", "producer" as "Job"
from versions.metaData.castAndCrew.producers.value
where languageCode = "ENG"');
var crew = null;
// merging crew with actors and producers, the crew can be empty.
try {
// get crew
var crew = CommonEtlUtils.executeSql(dataSet,
'select languageCode,displayName as "Name", role as "Job"
from versions.metaData.castAndCrew.crew.value where languageCode = "ENG"');
if (crew != null && crew.getRecordCount() > 0) {
people = CommonEtlUtils.union(people, crew, null, true, null, null);
}
}
catch (e) {
etlConfig.log(e);
}
Use INTERSECT and MINUS
Similar to UNION
, the INTERSECT
and MINUS
set operations are available when using SQL. The same restriction is also true: while keywords INTERSECT
and MINUS
are available, Etlworks always executes INTERSECT ALL
and MINUS ALL
.
To perform INTERSECT
from the JavaScript:
importPackage(com.toolsverse.etl.common);
var crew = CommonEtlUtils.executeSql(dataSet,
'select languageCode,displayName as "Name", role as "Job" from
versions.metaData.castAndCrew.crew.value where languageCode = "ENG"');
people = CommonEtlUtils.union(people, crew, null);
To perform MINUS
from the JavaScript:
importPackage(com.toolsverse.etl.common);
var crew = CommonEtlUtils.executeSql(dataSet,
'select languageCode,displayName as "Name", role as "Job"
from versions.metaData.castAndCrew.crew.value where languageCode = "ENG"');
people = CommonEtlUtils.minus(people, crew, null);
Use JOIN and other transformations
Other high-level transformations, such as JOIN, can be executed by calling static methods of the CommonEtlUtils.
Create complex XML documents
In addition to the rules and techniques used when creating complex XML and JSON documents, the following XML-specific techniques are available:
- Root XML tag
- XML namespaces
- Field is an XML attribute
- Create fields with both attributes and a value
- XML tag with CDATA section
- Rules for repeating elements
Root XML tag
You can define a Root XML tag when configuring a destination for the transformation in the XML Format.
XML namespaces
You can define XML namespaces when configuring a destination for the transformation in the XML Format.
Field is an XML attribute
The example below demonstrates how to add a field, and define it as an XML attribute:
staging.addField("ContentID", "11-22-33-44);
staging.getFieldDef("ContentID").setAttribute(true);
Which will produce the following XML:
<tag ContentID="11-22-33-44"/>
Create fields with both attributes and a value
If you need to create an XML tag that has both: attribute(s) and a value, as shown below:
<tag ContentID="11-22-33-44" anotherattr="attr value">some value</tag>
Use the following technique, as in the example below:
importPackage(com.toolsverse.etl.common);
var tag = new DataSet();
tag.setName("tag");
tag.addField("ContentID", "11-22-33-44");
tag.getFieldDef("ContentID").setAttribute(true);
tag.addField("anotherattr", "attr value");
tag.getFieldDef("anotherattr").setAttribute(true);
tag.addField("value", "some value");
tag.getFieldDef("value").setTagValueField(true);
staging.addField("tag", tag);
XML tag with CDATA section
If you need to create an XML tag with a CDATA
section, as shown below:
<tag><![CDATA[a lot of text]]></tag>
Use the following JavaScript:
owner.getFieldDef("field name").setCdata(true);
Rules for repeating elements
By default, any dataset will be serialized to XML as a root element, followed by the row tags.
<owner>
<row/>
....
<row/>
</owner>
If you want to hide the owner tag in XML, use the following technique, as shown in the example below:
owner.getFieldDef("row field name").setHiddenInNested(true);
Given the example above, the resulting XML is shown below. Notice that there are no longer any <owner>
tags:
<row/>
....
<row/>
Hide a row tag
By default, any dataset will be serialized to XML as a root element, followed by the row tags.
<owner>
<row/>
....
<row/>
</owner>
If you want to hide the <row>
tag in XML, use the following technique, as shown in the example below:
owner.getFieldDef("field name").setHasNoParentRowTag(true);
For example, if the hierarchy of elements in XML was created using the following JavaScript code fragment:
importPackage(com.toolsverse.etl.common);
var images = new DataSet();
detail.addField("images",images);
property.elements.forEach(function(item) {
images.addField("image", item.MediaURL);
});
It will produce the following XML:
<images>
<row/>
<image>url</image>
<row/>
<row/>
<image>url</image>
<row/>
</owner>
If you want to hide the <row>
tag:
<images>
<image>url</image>
<image>url</image>
</owne
You need to modify the code as shown below:
importPackage(com.toolsverse.etl.common);
var images = new DataSet();
detail.addField("images",images);
property.elements.forEach(function(item) {
images.addField("image", item.MediaURL).getKey().setHasNoParentRowTag(true);
});
Notice the dataSet.addField(name, value).getKey().setHasNoParentRowTag(true)
. The addField
(code, value) returns a key/value pair where a key is a FieldDef
object and a value is a DataSet
which has the field.
Create complex JSON documents
In addition to the rules and techniques used when creating complex XML and JSON documents, the following JSON-specific techniques are available:
Define a field as an array of strings
If you need to create a JSON array like the one below:
"node_name":["value1","value2,"value3"]
Use the following technique:
dataSet.addField("node_name", "value1,value2,value3").getKey().
setSqlDataType(java.sql.Types.ARRAY);
or
dataSet.addField("node_name", "value1,value2,value3");
dataSet.getFieldDef("node_name").setSqlDataType(java.sql.Types.ARRAY);
Define a data type of the elements in the JSON array
By default, the JSON connector creates an array of strings (see example above).
If you need to create a JSON array of numbers like below:
"node_name":[1,2,3]
Use the following technique:
dataSet.addField("node_name", "1,2,3");
var field = dataSet.getFieldDef("node_name");
field.setSqlDataType(java.sql.Types.ARRAY);
field.setDataTypeInArray(java.sql.Types.INTEGER);
Define a field as a JSON array
When creating JSON documents, the system decides whether to create JSON arrays based on the number of elements in the nested dataset. Arrays are used if the number of rows is more than 1
.
You can manually enforce or disable the creation of arrays by setting the property FieldDef#setJsonArray(bool)
.
staging.getFieldDef("phones").setJsonArray(true);
Comments
0 comments
Please sign in to leave a comment.