Overview
This article explains an alternative approach to handling nested documents (JSON, XML, etc.) in Etlworks. Instead of transforming data in a flow, you can use Format parameters, a JavaScript preprocessor and/or XLST (for XML) to dynamically reshape the source document before it is processed.
How It Works
In Etlworks, when setting up a source-to-destination transformation, the source format defines how the source document (such as a file, API response, or database result set) is interpreted and transformed before it is passed down to the actual source-to-destination transformation.
Before data is processed, the source document can be transformed using one of the following methods:
• Code: A JavaScript preprocessor can modify the document dynamically before processing.
• Parameters: Built-in format settings allow extracting specific parts of the document (e.g., JSON path extraction).
• XSLT: For XML formats, an XSL Style Sheet can be applied to restructure the XML document.
By leveraging these transformation techniques, Etlworks allows greater flexibility in handling complex or nested data formats before they reach the final transformation stage.
When to use format-level transformations
In many cases, the source document does not need transformation—it can be processed as-is. However, there are situations where transformation is necessary, such as:
• Extracting only a specific section of a large or deeply nested document.
• Flattening hierarchical structures into a tabular format.
• Converting the document into a different structure (e.g., XML to JSON).
• Removing unnecessary elements to optimize processing.
Working with JSON Documents
Using Format Parameters to transform JSON
JSON Format provides two key Format parameters to transform JSON source documents before they are processed:
Path to the Node to Parse
• Extracts a specific nested structure from the JSON document.
• Skips all other nodes.
• Format: parent...parent.start
Serialize Nested Elements as Strings
• Converts nested objects/arrays into string representations instead of structured JSON.
Example: Using “Path to the Node to Parse”
When this parameter is set, the JSON parser starts reading from the specified node and ignores everything else.
Input JSON Document
{
"data": {
"boards": [
{"id": 1, "name": "Board A"},
{"id": 2, "name": "Board B"}
],
"next": "123"
}
}
Configuration
• Path to the Node to Parse: data.boards
Transformed Output
[
{"id": 1, "name": "Board A"},
{"id": 2, "name": "Board B"}
]
💡 Effect: The parser extracts the boards array and removes the rest of the JSON document.
Example: Using “Serialize Nested Elements as Strings”
When enabled, this setting converts nested objects/arrays into string representations instead of keeping them as structured JSON.
Input JSON Document
{
"id": 123,
"details": {
"category": "Tech",
"price": 100
}
}
Configuration
• Serialize Nested Elements as Strings: ✅ Enabled
Transformed Output
{
"id": 123,
"details": "{\"category\":\"Tech\",\"price\":100}"
}
💡 Effect: The details object is converted into a JSON-encoded string.
Transforming JSON Documents with JavaScript Preprocessor
A preprocessor is a JavaScript function that modifies the original JSON document stored in the message variable. The modified document is returned as a string.
Configuring the Preprocessor for JSON Format
To configure a Preprocessor for JSON format:
Step 1. Open source JSON format used in a transformation.
Step 2. Locate the Preprocessor field under the Inline transformation section.
Step 3. Enter the JavaScript code in the provided text box. The script should modify the message variable and return the transformed JSON as a string.
Example: Extracting and Flattening a Nested JSON Structure
Let’s assume an API returns the following deeply nested JSON:
{
"status": "success",
"meta": {
"count": 2,
"timestamp": "2025-01-30T12:00:00Z"
},
"data": [
{
"id": 1,
"name": "John Doe",
"profile": {
"age": 30,
"gender": "male"
},
"location": {
"city": "New York",
"country": "USA"
}
},
{
"id": 2,
"name": "Jane Doe",
"profile": {
"age": 28,
"gender": "female"
},
"location": {
"city": "Los Angeles",
"country": "USA"
}
}
]
}
💡 Goal:
• Extract only the data array.
• Flatten nested objects inside data (like contact and address).
• Convert all nested structures into JSON-encoded strings.
Preprocessor Code
var json = JSON.parse(message); // Parse the JSON input
var transformedData = json.data.map(function(item) {
var flattenedItem = {}; // New object to store the transformed structure
Object.keys(item).forEach(function(key) {
if (typeof item[key] === "object" && item[key] !== null) {
flattenedItem[key] = JSON.stringify(item[key]); // Convert nested object to string
} else {
flattenedItem[key] = item[key]; // Keep primitive values as they are
}
});
return flattenedItem;
});
value = JSON.stringify(transformedData); // Return the transformed JSON as a string
Transformed Output
[
{
"id": 1,
"name": "John Doe",
"profile": "{\"age\":30,\"gender\":\"male\"}",
"location": "{\"city\":\"New York\",\"country\":\"USA\"}"
},
{
"id": 2,
"name": "Jane Doe",
"profile": "{\"age\":28,\"gender\":\"female\"}",
"location": "{\"city\":\"Los Angeles\",\"country\":\"USA\"}"
}
]
💡 Effect: JSON is flattened, nested nodes converted to strings
How It Works
1. Extracts the data array from the JSON.
2. Iterates over each item in the data array.
3. Checks each field inside the item:
• If the field is a nested object, it is converted to a string.
• If the field is a primitive value (string, number, boolean, null), it is kept as-is.
4. Returns a flat JSON array where all nested structures are stringified.
Working with XML documents
Transforming XML Documents with JavaScript Preprocessor
In Etlworks, when using a JavaScript preprocessor to manipulate XML, you can process XML as a string (manual approach) or use Java’s built-in SAX parser for structured transformations.
Configuring the Preprocessor for XML Format
To configure a Preprocessor for XML format:
Step 1. Open source XML format used in a transformation.
Step 2. Locate the XSL Style Sheet or Preprocessor field under the Inline transformation section.
Step 3. Enter the JavaScript code in the provided text box. The script should modify the message variable and return the transformed XML as a string.
Step 4. Set Transformation type to preprocessor.
Below are examples demonstrating both approaches.
Working with XML as a String (Manual Approach)
Using string functions, you can modify XML documents before processing. This is useful for simple transformations like removing, replacing, or extracting specific elements.
Example: Remove a Specific XML Node
Input XML
<root>
<items>
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
</items>
<metadata>
<timestamp>2025-01-01T12:00:00</timestamp>
</metadata>
</root>
JavaScript Preprocessor Code
// Remove <metadata> node using regex
var transformedXml = message.replace(/<metadata>.*?<\/metadata>/s, '');
value = transformedXml; // Return transformed XML
Transformed Output
<root>
<items>
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
</items>
</root>
💡 Effect: The <metadata> node is removed.
Example: Rename a Node
JavaScript Preprocessor Code
// Replace <items> with <list> and </items> with </list>
var transformedXml = message
.replace(/<items>/g, '<list>')
.replace(/<\/items>/g, '</list>');
value = transformedXml; // Return transformed XML
Transformed Output
<root>
<list>
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
</list>
</root>
💡 Effect: <items> is renamed to <list>.
Example: Extract a Specific Node’s Content
// Extract content of the <items> node
var match = message.match(/<items>(.*?)<\/items>/s);
value = match ? match[1] : ''; // Extract content inside <items>
Transformed Output
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
💡 Effect: Extracts only the contents of <items>.
Working with Java’s SAX Parser in JavaScript
For more complex transformations, you can use Java’s SAX parser in the JavaScript preprocessor. SAX (Simple API for XML) is a fast, memory-efficient way to parse and modify XML.
Example: Removing a Specific XML Node Using Java SAX Parser
JavaScript Preprocessor Code
var SAXParserFactory = Java.type("javax.xml.parsers.SAXParserFactory");
var InputSource = Java.type("org.xml.sax.InputSource");
var StringReader = Java.type("java.io.StringReader");
var DefaultHandler = Java.type("org.xml.sax.helpers.DefaultHandler");
var StringWriter = Java.type("java.io.StringWriter");
var TransformerFactory = Java.type("javax.xml.transform.TransformerFactory");
var Transformer = TransformerFactory.newInstance().newTransformer();
var DOMImplementationLS = Java.type("org.w3c.dom.ls.DOMImplementationLS");
var DocumentBuilderFactory = Java.type("javax.xml.parsers.DocumentBuilderFactory");
// Create XML Parser
var factory = SAXParserFactory.newInstance();
var builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
var doc = builder.parse(new InputSource(new StringReader(message)));
// Remove <metadata> node
var metadataNode = doc.getElementsByTagName("metadata").item(0);
if (metadataNode !== null) {
metadataNode.getParentNode().removeChild(metadataNode);
}
// Convert modified XML document back to a string
var writer = new StringWriter();
Transformer.transform(new javax.xml.transform.dom.DOMSource(doc),
new javax.xml.transform.stream.StreamResult(writer));
value = writer.toString(); // Return modified XML
💡 Effect: The <metadata> node is removed before processing.
Renaming a Node (<items> → <list>)
This example renames the <items> node to <list> dynamically.
Preprocessor Code
var SAXParserFactory = Java.type("javax.xml.parsers.SAXParserFactory");
var InputSource = Java.type("org.xml.sax.InputSource");
var StringReader = Java.type("java.io.StringReader");
var DefaultHandler = Java.type("org.xml.sax.helpers.DefaultHandler");
var DocumentBuilderFactory = Java.type("javax.xml.parsers.DocumentBuilderFactory");
var TransformerFactory = Java.type("javax.xml.transform.TransformerFactory");
var Transformer = TransformerFactory.newInstance().newTransformer();
var StringWriter = Java.type("java.io.StringWriter");
// Create XML Parser
var factory = SAXParserFactory.newInstance();
var builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
var doc = builder.parse(new InputSource(new StringReader(message)));
// Rename <items> to <list>
var itemsNode = doc.getElementsByTagName("items").item(0);
if (itemsNode !== null) {
var newElement = doc.createElement("list");
// Move all child nodes from <items> to <list>
while (itemsNode.hasChildNodes()) {
newElement.appendChild(itemsNode.firstChild);
}
// Replace <items> with <list>
itemsNode.getParentNode().replaceChild(newElement, itemsNode);
}
// Convert modified XML document back to a string
var writer = new StringWriter();
Transformer.transform(new javax.xml.transform.dom.DOMSource(doc),
new javax.xml.transform.stream.StreamResult(writer));
value = writer.toString(); // Return modified XML
💡 Effect: <items> is renamed to <list> dynamically
Extracting Only a Specific XML Node’s Content
This example extracts only the <items> node from the XML.
var SAXParserFactory = Java.type("javax.xml.parsers.SAXParserFactory");
var InputSource = Java.type("org.xml.sax.InputSource");
var StringReader = Java.type("java.io.StringReader");
var DefaultHandler = Java.type("org.xml.sax.helpers.DefaultHandler");
var DocumentBuilderFactory = Java.type("javax.xml.parsers.DocumentBuilderFactory");
var TransformerFactory = Java.type("javax.xml.transform.TransformerFactory");
var Transformer = TransformerFactory.newInstance().newTransformer();
var StringWriter = Java.type("java.io.StringWriter");
// Create XML Parser
var factory = SAXParserFactory.newInstance();
var builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
var doc = builder.parse(new InputSource(new StringReader(message)));
// Extract <items> node
var itemsNode = doc.getElementsByTagName("items").item(0);
if (itemsNode !== null) {
doc.removeChild(doc.getDocumentElement()); // Remove root
doc.appendChild(itemsNode.cloneNode(true)); // Keep only <items>
}
// Convert modified XML document back to a string
var writer = new StringWriter();
Transformer.transform(new javax.xml.transform.dom.DOMSource(doc),
new javax.xml.transform.stream.StreamResult(writer));
value = writer.toString(); // Return modified XML
💡 Effect: Converts XML into JSON before processing.
Using XSL Transformations to transform XML documents
In Etlworks, you can apply XSL transformations (XSLT) to modify XML documents dynamically before they are processed. XSLT is a powerful language for transforming XML documents into different structures or even into other formats like JSON, CSV, or HTML.
Key Benefits of Using XSLT in Etlworks
• Native XML Transformation: No need for additional scripting; works inline.
• Flexible Processing: Can restructure, extract, or convert XML as needed.
• Compatible with Existing XML Workflows: Can be used with external APIs and data integration tasks.
Configuring the XSL transormation for XML Format
To use XSLT in Etlworks, follow these steps:
Step 1. Set the Transformation Type to query.
Step 2. Enter a valid XSL Style Sheet in the XSL Style Sheet or Preprocessor field.
Step 3. The transformation will be applied at runtime, modifying the XML before further processing.
Example : Extracting Specific XML Nodes using XSL
Input XML
<root>
<items>
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
</items>
<metadata>
<timestamp>2025-01-01T12:00:00</timestamp>
</metadata>
</root>
XSLT for Extraction
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<filteredItems>
<xsl:copy-of select="//item"/>
</filteredItems>
</xsl:template>
</xsl:stylesheet>
Transformed Output
<filteredItems>
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
</filteredItems>
💡 Effect: The <metadata> section is removed, leaving only <item> nodes.
Example : Flattening XML Structure
This example flattens a nested XML structure by converting elements into attributes.
<order>
<customer>
<name>John Doe</name>
<email>john@example.com</email>
</customer>
<details>
<product>Gadget</product>
<price>99.99</price>
</details>
</order>
XSLT for Flattening
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<orderSummary>
<customer name="{order/customer/name}" email="{order/customer/email}"/>
<product name="{order/details/product}" price="{order/details/price}"/>
</orderSummary>
</xsl:template>
</xsl:stylesheet>
Transformed Output
<orderSummary>
<customer name="John Doe" email="john@example.com"/>
<product name="Gadget" price="99.99"/>
</orderSummary>
💡 Effect: The structure is flattened, making it easier to process in tabular formats.
Comments
0 comments
Please sign in to leave a comment.