Create XML and JSON documents using JavaScript – Etlworks Support

Overview

Note: This technique can be used to create documents in any supported Format. XML and JSON are the most commonly used data exchange Formats.

It is possible to use Mapping Editor to create complex nested documents without writing a single line of code. The Mapping editor can also be used to flatten complex nested documents. Below, we will look at some extreme cases where the destination document is so complex that using the drag-and-drop editor is not the best option.

When Etlworks parses the source file, it creates an internal data structure called a DataSet. Its methods can be used to create a new dataset, which will then be used by the flow to automatically create a destination file. Here's a step-by-step instruction on how to do this:

Step 1. Create a Flow where the destination is a file or API.

Step 2. Select Mapping > Parameters and disable Streaming.

Step 3. Enter the transformational JavaScript code into the After Extract field.

Code template

Copy and paste this template to start writing a transformation:

importPackage(com.toolsverse.etl.common);

// create a staging dataset
var root = new DataSet();

// create fields and data in the staging dataset
...

// update the dataset, which will be used to create a final documemnt, 
// by copying data and fields from the staging dataset

dataSet.setFields(root.getFields());
dataSet.setData(root.getData());

Techniques

The following techniques can be used to create a data set that will be automatically serialized as a nested XML or JSON document:

Understand the DataSet

A DataSet is similar to a database table. It has fields and the rows are stored in DataSetData. Each value in the row can be either a generic value (integer, string, date, etc.) or a DataSet by itself. This allows datasets to have an infinitely complex structure, supporting all types of real-life use cases.

Traverse the original dataset

When creating a staging dataset, you will be traversing the original dataset, which was populated during the extract, then updating fields, and data in the staging dataset.

Read about the DataSet methods, available when traversing the original dataset, and updating the staging dataset.

An example of traversing the original, and updating the staging dataset:

for (row = 0; row < allGenres.getRecordCount(); row++) {
    genre.addField("genre", allGenres.getFieldValue(row, 2));
}

Add Field

It is possible to add a new field (with the value) to the dataSet. If a field with the given name already exists in the given dataSet, the new record will be added.

dataSet.addField("field name", fieldValue);

Add nested field

data.addField("owner.field", fieldValue);

Example:

staging.addField("patient.address", "123 Main st");

Add field with specific data type

data.addField("field name", fieldValue, Type, nativeType);

Example:

staging.addField("firstName", "Joe", java.sql.Types.VARCHAR, "VARCHAR(255)");

Rename field

It is possible to programmatically rename the field in the document.

Original XML

<SourceID>123456</SourceID>

New XML

<ID>123456</ID>

To rename the field:

// get the reference to the field. Use actual data set
var fld = dataSet.getFieldDef("SourceID");

/* set the name of the field and the real name of the field 
as it should appear in the created document. */ 

fld.setName("ID");
fld.setRealName("ID");

Hide field

It is possible to programmatically hide the field in the document.

Original XML

<ID name="abc" value="xyz">123456</ID>

New XML

<ID value="xyz">123456</ID>

To hide the field in the document:

// get the reference to the field. Use actual data set and field name
var fld = dataSet.getFieldDef("abc");

// hide the field
fld.setVisible(false);

Use SQL

SQL Lookup is a technique that can be used to extract data from the original dataset, even multiple levels deep inside a nested structure.

Read how to write SQL queries, which can be executed on any dataset.

Use CommonEtlUtils.executeSql(dataSet, sql) to execute SQL on any dataset and return another dataset.

The example below demonstrates how to use SQL lookup to extract deeply-nested data from the original dataset:

importPackage(com.toolsverse.etl.common);

var allGenres = CommonEtlUtils.executeSql(dataSet, 
"select localeCode, * 
from versions.metaData.genre.values.SMAT.value where localeCode = 'US'");

Notice that in the code from versions.metaData.genre.values.SMAT.value, each group separated by a., represents another dimension in the nested dataset.

Use Lookup

When the data needed to update a staging dataset is located somewhere else, not in the original dataset, you can use lookup. In addition to SQL, lookup requires a Connection name, and a data object name (for example, a database table name, or a file name).

Use UNION

UNION is available when using SQL, with one restriction: while the keyword UNION is supported, Etlworks in fact always executes UNION ALL.

To perform UNION from the JavaScript:

importPackage(com.toolsverse.etl.common);

var people = CommonEtlUtils.executeSql(dataSet, 
'select languageCode,displayName as "Name", "actor" as "Job" 
from versions.metaData.castAndCrew.actors.value 
where languageCode = "ENG" 
union select languageCode,displayName as "Name", "producer" as "Job" 
from versions.metaData.castAndCrew.producers.value 
where languageCode = "ENG"');

var crew = null;

// merging crew with actors and producers, the crew can be empty.
try {
  // get crew
  var crew = CommonEtlUtils.executeSql(dataSet, 
'select languageCode,displayName as "Name", role as "Job" 
from versions.metaData.castAndCrew.crew.value where languageCode = "ENG"');

  if (crew != null && crew.getRecordCount() > 0) {
       people = CommonEtlUtils.union(people, crew, null, true, null, null);
  }  
}
catch (e) {
  etlConfig.log(e);
}

Use INTERSECT and MINUS

Similar to UNION, the INTERSECT and MINUS set operations are available when using SQL. The same restriction is also true: while keywords INTERSECT and MINUS are available, Etlworks always executes INTERSECT ALL and MINUS ALL.

To perform INTERSECT from the JavaScript:

importPackage(com.toolsverse.etl.common);

var crew = CommonEtlUtils.executeSql(dataSet, 
'select languageCode,displayName as "Name", role as "Job" from 
versions.metaData.castAndCrew.crew.value where languageCode = "ENG"');

people = CommonEtlUtils.union(people, crew, null);

To perform MINUS from the JavaScript:

importPackage(com.toolsverse.etl.common);

var crew = CommonEtlUtils.executeSql(dataSet, 
'select languageCode,displayName as "Name", role as "Job" 
from versions.metaData.castAndCrew.crew.value where languageCode = "ENG"');

people = CommonEtlUtils.minus(people, crew, null);

Use JOIN and other transformations

Other high-level transformations, such as JOIN, can be executed by calling static methods of the CommonEtlUtils.

Create complex XML documents

In addition to the rules and techniques used when creating complex XML and JSON documents, the following XML-specific techniques are available:

Root XML tag
XML namespaces
Field is an XML attribute
Create fields with both attributes and a value
XML tag with CDATA section
Rules for repeating elements

Root XML tag

You can define a Root XML tag when configuring a destination for the transformation in the XML Format.

XML namespaces

You can define XML namespaces when configuring a destination for the transformation in the XML Format.

Field is an XML attribute

The example below demonstrates how to add a field, and define it as an XML attribute:

staging.addField("ContentID", "11-22-33-44);
staging.getFieldDef("ContentID").setAttribute(true);

Which will produce the following XML:

<tag ContentID="11-22-33-44"/>

Create fields with both attributes and a value

If you need to create an XML tag that has both: attribute(s) and a value, as shown below:

<tag ContentID="11-22-33-44" anotherattr="attr value">some value</tag>

Use the following technique, as in the example below:

importPackage(com.toolsverse.etl.common);

var tag = new DataSet();
tag.setName("tag");

tag.addField("ContentID", "11-22-33-44");
tag.getFieldDef("ContentID").setAttribute(true);

tag.addField("anotherattr", "attr value");
tag.getFieldDef("anotherattr").setAttribute(true);

tag.addField("value", "some value");
tag.getFieldDef("value").setTagValueField(true);

staging.addField("tag", tag);

XML tag with CDATA section

If you need to create an XML tag with a CDATA section, as shown below:

<tag><![CDATA[a lot of text]]></tag>

Use the following JavaScript:

owner.getFieldDef("field name").setCdata(true);

Rules for repeating elements

By default, any dataset will be serialized to XML as a root element, followed by the row tags.

<owner>
   <row/>
   ....
   <row/>
</owner>

If you want to hide the owner tag in XML, use the following technique, as shown in the example below:

owner.getFieldDef("row field name").setHiddenInNested(true);

Given the example above, the resulting XML is shown below. Notice that there are no longer any <owner> tags:

   <row/>
   ....
   <row/>

Hide a row tag

By default, any dataset will be serialized to XML as a root element, followed by the row tags.

<owner>
   <row/>
   ....
   <row/>
</owner>

If you want to hide the <row> tag in XML, use the following technique, as shown in the example below:

owner.getFieldDef("field name").setHasNoParentRowTag(true);

For example, if the hierarchy of elements in XML was created using the following JavaScript code fragment:

importPackage(com.toolsverse.etl.common);

var images = new DataSet();
 
detail.addField("images",images);
 
property.elements.forEach(function(item) {
 images.addField("image", item.MediaURL);
});

It will produce the following XML:

<images>
   <row/>
      <image>url</image> 
   <row/>
   <row/>
      <image>url</image> 
   <row/>
</owner>

If you want to hide the <row> tag:

<images>
  <image>url</image> 
  <image>url</image> 
</owne

You need to modify the code as shown below:

importPackage(com.toolsverse.etl.common);

var images = new DataSet();
 
detail.addField("images",images);
 
property.elements.forEach(function(item) {
 images.addField("image", item.MediaURL).getKey().setHasNoParentRowTag(true);
});

Notice the dataSet.addField(name, value).getKey.setHasNoParentRowTag(true). The addField (code, value) returns a key/value pair where a key is a FieldDef object and a value is a DataSet which has the field.

Create complex JSON documents

In addition to the rules and techniques used when creating complex XML and JSON documents, the following JSON-specific techniques are available:

Define a field as an array of strings

If you need to create a JSON array like the one below:

"node_name":["value1","value2,"value3"]

Use the following technique:

dataSet.addField("node_name", "value1,value2,value3").getKey().
    setSqlDataType(java.sql.Types.ARRAY);

or 

dataSet.addField("node_name", "value1,value2,value3");
dataSet.getFieldDef("node_name").setSqlDataType(java.sql.Types.ARRAY);

Define a data type of the elements in the JSON array

By default, the JSON connector creates an array of strings (see example above).

If you need to create a JSON array of numbers like below:

"node_name":[1,2,3]

Use the following technique:

dataSet.addField("node_name", "1,2,3");
var field = dataSet.getFieldDef("node_name");
field.setSqlDataType(java.sql.Types.ARRAY); 
field.setDataTypeInArray(java.sql.Types.INTEGER);

Define a field as a JSON array

When creating JSON documents, the system decides whether to create JSON arrays based on the number of elements in the nested dataset. Arrays are used if the number of rows is more than 1.

You can manually enforce or disable the creation of arrays by setting the property FieldDef#setJsonArray(bool).

staging.getFieldDef("phones").setJsonArray(true);

Articles in this section

Overview

Code template

Techniques

Understand the DataSet

Traverse the original dataset

Add Field

Add nested field

Add field with specific data type

Rename field

Original XML

New XML

To rename the field:

Hide field

Original XML

New XML

To hide the field in the document:

Use SQL

Use Lookup

Use UNION

Use INTERSECT and MINUS

Use JOIN and other transformations

Create complex XML documents

Root XML tag

XML namespaces

Field is an XML attribute

Create fields with both attributes and a value

XML tag with CDATA section

Rules for repeating elements

Hide a row tag

Create complex JSON documents

Define a field as an array of strings

Define a data type of the elements in the JSON array

Define a field as a JSON array

Related articles