Overview
Etlworks supports two primary approaches for transforming source documents before they are loaded or sent to the destination:
-
Source Query: Applies a transformation to the entire source document using JavaScript, SQL, or XSLT, before any format-specific or internal processing occurs.
-
Format-Level Configuration: Uses settings defined in the source format (e.g., JSON path, JavaScript preprocessor, or XSLT) to extract or reshape the document during the parsing stage.
Source Query transformations
Source Query is the recommended and most flexible way to transform documents in Etlworks. Instead of configuring transformation logic in a format, you define it directly in the Source Query field of the flow.
Why Use Source Query
-
No need to create or modify formats: Transformations are defined inline within the flow, making them easier to implement, test, and reuse.
-
Executed before internal processing: The transformation is applied to the raw source document before it’s parsed into datasets or mapped to a destination.
-
Supports multiple transformation types: Etlworks automatically detects whether the Source Query is written in SQL, JavaScript, or XSLT.
-
Ideal for JSON-to-JSON flows: Especially useful for web service-to-web service integrations, where unnecessary parsing and serialization can be avoided entirely.
How Source Query Detection Works
Etlworks detects the transformation type based on the content of the Source Query:
-
SQL: If the query is valid SQL, it is executed against the document as a virtual table. This technique is powerful for filtering, reshaping, and querying structured content.
-
JavaScript: If the query is not valid SQL, Etlworks treats it as JavaScript. The raw document is available in the message variable, and the transformed content must be assigned to value.
This is functionally identical to a JavaScript preprocessor in the format, but far more convenient, as it eliminates the need to edit format definitions for each use case.
-
XSLT: If the source format is XML and the query contains valid XSLT, it is automatically applied as a stylesheet transformation. This is equivalent to setting an XSLT in the format but is easier to manage and test directly in the flow.
Example: JavaScript Source Query
var obj = JSON.parse(message);
var result = obj.items.map(function(item) {
return {
id: item.id,
name: item.name,
metadata: JSON.stringify(item.details)
};
});
value = JSON.stringify(result);
What it does:
-
Parses the input JSON.
-
Extracts the items array.
-
Flattens and reshapes each item.
-
Converts nested details objects into strings.
-
Returns the final array as the transformed result.
Real-World Use Case: Web Service to Web Service Flow
Suppose you’re building a flow that pulls data from one REST API and pushes it to another. Both APIs use JSON, but their schemas differ.
Input JSON from source API:
{
"status": "ok",
"data": [
{ "itemNumber": "ABC", "qtyAvailable": 100 },
{ "itemNumber": "XYZ", "qtyAvailable": 250 }
]
}
Expected JSON by destination API:
{
"inventory": [
{ "sku": "ABC", "quantity": 100 },
{ "sku": "XYZ", "quantity": 250 }
]
}
Step 1: Create ETL flow where both source and destination are web service:
Step 2. Configure source to destination transfromation:
To prevent Etlworks from performing any additional parsing or transformations, set both the source and destination formats to CLOB. CLOB is a raw text format that preserves the exact transformed output.
Step 3. Click Configure button
and set Source query:
Source Query (JavaScript):
var obj = JSON.parse(message);
var payload = {
inventory: obj.data.map(function(item) {
return {
sku: item.itemNumber,
quantity: item.qtyAvailable
};
})
};
value = JSON.stringify(payload);
Step 4. Click Test transformation to test the output
Testing JavaScript Source Queries in Explorer
You can test your transformation logic without running a full flow using the Explorer:
Step 1. Open the Explorer and select the source document (file, API endpoint, etc.).
Step 2. Write your JavaScript transformation using the message variable and assign the result to value.
Step 3. From the dropdown next to the play icon, choose Execute Script.
Etlworks will execute the transformation and show the result immediately, perfect for debugging and iteration.
Format-level transformations
Why use format-level transformations
In many cases, the source document can be processed as-is. However, format-level transformations are useful when you need to:
-
Extract a specific section of a large or deeply nested document.
-
Flatten hierarchical structures into a tabular format.
-
Convert documents between formats (e.g., XML → JSON).
-
Remove unnecessary elements to optimize downstream processing.
How format-level transformations work
When setting up a flow, the source format defines how the incoming document is interpreted. This happens after the raw document is loaded but before it is parsed into datasets. The destination format then controls how the data is serialized before delivery.
Etlworks provides several options within the source format for shaping the document:
-
Preprocessor – JavaScript code that modifies the document before parsing.
-
Parameters – Format-specific settings, such as JSON path extraction or flattening options.
-
XSLT – For XML documents, an XSL Style Sheet can be applied to restructure or transform the XML.
Working with JSON Documents
Using Format Parameters to transform JSON
JSON Format provides two key Format parameters to transform JSON source documents before they are processed:
Path to the Node to Parse
• Extracts a specific nested structure from the JSON document.
• Skips all other nodes.
• Format: parent...parent.start
Serialize Nested Elements as Strings
• Converts nested objects/arrays into string representations instead of structured JSON.
Example: Using “Path to the Node to Parse”
When this parameter is set, the JSON parser starts reading from the specified node and ignores everything else.
Input JSON Document
{
"data": {
"boards": [
{"id": 1, "name": "Board A"},
{"id": 2, "name": "Board B"}
],
"next": "123"
}
}
Configuration
• Path to the Node to Parse: data.boards
Transformed Output
[
{"id": 1, "name": "Board A"},
{"id": 2, "name": "Board B"}
]
💡 Effect: The parser extracts the boards array and removes the rest of the JSON document.
Example: Using “Serialize Nested Elements as Strings”
When enabled, this setting converts nested objects/arrays into string representations instead of keeping them as structured JSON.
Input JSON Document
{
"id": 123,
"details": {
"category": "Tech",
"price": 100
}
}
Configuration
• Serialize Nested Elements as Strings: ✅ Enabled
Transformed Output
{
"id": 123,
"details": "{\"category\":\"Tech\",\"price\":100}"
}
💡 Effect: The details object is converted into a JSON-encoded string.
Transforming JSON Documents with JavaScript Preprocessor
A preprocessor is a JavaScript function that modifies the original JSON document stored in the message variable. The modified document is returned as a string.
Configuring the Preprocessor for JSON Format
To configure a Preprocessor for JSON format:
Step 1. Open source JSON format used in a transformation.
Step 2. Locate the Preprocessor field under the Inline transformation section.
Step 3. Enter the JavaScript code in the provided text box. The script should modify the message variable and return the transformed JSON as a string.
Example: Extracting and Flattening a Nested JSON Structure
Let’s assume an API returns the following deeply nested JSON:
{
"status": "success",
"meta": {
"count": 2,
"timestamp": "2025-01-30T12:00:00Z"
},
"data": [
{
"id": 1,
"name": "John Doe",
"profile": {
"age": 30,
"gender": "male"
},
"location": {
"city": "New York",
"country": "USA"
}
},
{
"id": 2,
"name": "Jane Doe",
"profile": {
"age": 28,
"gender": "female"
},
"location": {
"city": "Los Angeles",
"country": "USA"
}
}
]
}
💡 Goal:
• Extract only the data array.
• Flatten nested objects inside data (like contact and address).
• Convert all nested structures into JSON-encoded strings.
Preprocessor Code
var json = JSON.parse(message); // Parse the JSON input
var transformedData = json.data.map(function(item) {
var flattenedItem = {}; // New object to store the transformed structure
Object.keys(item).forEach(function(key) {
if (typeof item[key] === "object" && item[key] !== null) {
flattenedItem[key] = JSON.stringify(item[key]); // Convert nested object to string
} else {
flattenedItem[key] = item[key]; // Keep primitive values as they are
}
});
return flattenedItem;
});
value = JSON.stringify(transformedData); // Return the transformed JSON as a string
Transformed Output
[
{
"id": 1,
"name": "John Doe",
"profile": "{\"age\":30,\"gender\":\"male\"}",
"location": "{\"city\":\"New York\",\"country\":\"USA\"}"
},
{
"id": 2,
"name": "Jane Doe",
"profile": "{\"age\":28,\"gender\":\"female\"}",
"location": "{\"city\":\"Los Angeles\",\"country\":\"USA\"}"
}
]
💡 Effect: JSON is flattened, nested nodes converted to strings
How It Works
1. Extracts the data array from the JSON.
2. Iterates over each item in the data array.
3. Checks each field inside the item:
• If the field is a nested object, it is converted to a string.
• If the field is a primitive value (string, number, boolean, null), it is kept as-is.
4. Returns a flat JSON array where all nested structures are stringified.
Working with XML documents
Transforming XML Documents with JavaScript Preprocessor
In Etlworks, when using a JavaScript preprocessor to manipulate XML, you can process XML as a string (manual approach) or use Java’s built-in SAX parser for structured transformations.
Configuring the Preprocessor for XML Format
To configure a Preprocessor for XML format:
Step 1. Open source XML format used in a transformation.
Step 2. Locate the XSL Style Sheet or Preprocessor field under the Inline transformation section.
Step 3. Enter the JavaScript code in the provided text box. The script should modify the message variable and return the transformed XML as a string.
Step 4. Set Transformation type to preprocessor.
Below are examples demonstrating both approaches.
Working with XML as a String (Manual Approach)
Using string functions, you can modify XML documents before processing. This is useful for simple transformations like removing, replacing, or extracting specific elements.
Example: Remove a Specific XML Node
Input XML
<root>
<items>
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
</items>
<metadata>
<timestamp>2025-01-01T12:00:00</timestamp>
</metadata>
</root>
JavaScript Preprocessor Code
// Remove <metadata> node using regex
var transformedXml = message.replace(/<metadata>.*?<\/metadata>/s, '');
value = transformedXml; // Return transformed XML
Transformed Output
<root>
<items>
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
</items>
</root>
💡 Effect: The <metadata> node is removed.
Example: Rename a Node
JavaScript Preprocessor Code
// Replace <items> with <list> and </items> with </list>
var transformedXml = message
.replace(/<items>/g, '<list>')
.replace(/<\/items>/g, '</list>');
value = transformedXml; // Return transformed XML
Transformed Output
<root>
<list>
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
</list>
</root>
💡 Effect: <items> is renamed to <list>.
Example: Extract a Specific Node’s Content
// Extract content of the <items> node
var match = message.match(/<items>(.*?)<\/items>/s);
value = match ? match[1] : ''; // Extract content inside <items>
Transformed Output
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
💡 Effect: Extracts only the contents of <items>.
Working with Java’s SAX Parser in JavaScript
For more complex transformations, you can use Java’s SAX parser in the JavaScript preprocessor. SAX (Simple API for XML) is a fast, memory-efficient way to parse and modify XML.
Example: Removing a Specific XML Node Using Java SAX Parser
JavaScript Preprocessor Code
var SAXParserFactory = Java.type("javax.xml.parsers.SAXParserFactory");
var InputSource = Java.type("org.xml.sax.InputSource");
var StringReader = Java.type("java.io.StringReader");
var DefaultHandler = Java.type("org.xml.sax.helpers.DefaultHandler");
var StringWriter = Java.type("java.io.StringWriter");
var TransformerFactory = Java.type("javax.xml.transform.TransformerFactory");
var Transformer = TransformerFactory.newInstance().newTransformer();
var DOMImplementationLS = Java.type("org.w3c.dom.ls.DOMImplementationLS");
var DocumentBuilderFactory = Java.type("javax.xml.parsers.DocumentBuilderFactory");
// Create XML Parser
var factory = SAXParserFactory.newInstance();
var builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
var doc = builder.parse(new InputSource(new StringReader(message)));
// Remove <metadata> node
var metadataNode = doc.getElementsByTagName("metadata").item(0);
if (metadataNode !== null) {
metadataNode.getParentNode().removeChild(metadataNode);
}
// Convert modified XML document back to a string
var writer = new StringWriter();
Transformer.transform(new javax.xml.transform.dom.DOMSource(doc),
new javax.xml.transform.stream.StreamResult(writer));
value = writer.toString(); // Return modified XML
💡 Effect: The <metadata> node is removed before processing.
Renaming a Node (<items> → <list>)
This example renames the <items> node to <list> dynamically.
Preprocessor Code
var SAXParserFactory = Java.type("javax.xml.parsers.SAXParserFactory");
var InputSource = Java.type("org.xml.sax.InputSource");
var StringReader = Java.type("java.io.StringReader");
var DefaultHandler = Java.type("org.xml.sax.helpers.DefaultHandler");
var DocumentBuilderFactory = Java.type("javax.xml.parsers.DocumentBuilderFactory");
var TransformerFactory = Java.type("javax.xml.transform.TransformerFactory");
var Transformer = TransformerFactory.newInstance().newTransformer();
var StringWriter = Java.type("java.io.StringWriter");
// Create XML Parser
var factory = SAXParserFactory.newInstance();
var builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
var doc = builder.parse(new InputSource(new StringReader(message)));
// Rename <items> to <list>
var itemsNode = doc.getElementsByTagName("items").item(0);
if (itemsNode !== null) {
var newElement = doc.createElement("list");
// Move all child nodes from <items> to <list>
while (itemsNode.hasChildNodes()) {
newElement.appendChild(itemsNode.firstChild);
}
// Replace <items> with <list>
itemsNode.getParentNode().replaceChild(newElement, itemsNode);
}
// Convert modified XML document back to a string
var writer = new StringWriter();
Transformer.transform(new javax.xml.transform.dom.DOMSource(doc),
new javax.xml.transform.stream.StreamResult(writer));
value = writer.toString(); // Return modified XML
💡 Effect: <items> is renamed to <list> dynamically
Extracting Only a Specific XML Node’s Content
This example extracts only the <items> node from the XML.
var SAXParserFactory = Java.type("javax.xml.parsers.SAXParserFactory");
var InputSource = Java.type("org.xml.sax.InputSource");
var StringReader = Java.type("java.io.StringReader");
var DefaultHandler = Java.type("org.xml.sax.helpers.DefaultHandler");
var DocumentBuilderFactory = Java.type("javax.xml.parsers.DocumentBuilderFactory");
var TransformerFactory = Java.type("javax.xml.transform.TransformerFactory");
var Transformer = TransformerFactory.newInstance().newTransformer();
var StringWriter = Java.type("java.io.StringWriter");
// Create XML Parser
var factory = SAXParserFactory.newInstance();
var builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
var doc = builder.parse(new InputSource(new StringReader(message)));
// Extract <items> node
var itemsNode = doc.getElementsByTagName("items").item(0);
if (itemsNode !== null) {
doc.removeChild(doc.getDocumentElement()); // Remove root
doc.appendChild(itemsNode.cloneNode(true)); // Keep only <items>
}
// Convert modified XML document back to a string
var writer = new StringWriter();
Transformer.transform(new javax.xml.transform.dom.DOMSource(doc),
new javax.xml.transform.stream.StreamResult(writer));
value = writer.toString(); // Return modified XML
💡 Effect: Converts XML into JSON before processing.
Using XSL Transformations to transform XML documents
In Etlworks, you can apply XSL transformations (XSLT) to modify XML documents dynamically before they are processed. XSLT is a powerful language for transforming XML documents into different structures or even into other formats like JSON, CSV, or HTML.
Key Benefits of Using XSLT in Etlworks
• Native XML Transformation: No need for additional scripting; works inline.
• Flexible Processing: Can restructure, extract, or convert XML as needed.
• Compatible with Existing XML Workflows: Can be used with external APIs and data integration tasks.
Configuring the XSL transormation for XML Format
To use XSLT in Etlworks, follow these steps:
Step 1. Set the Transformation Type to query.
Step 2. Enter a valid XSL Style Sheet in the XSL Style Sheet or Preprocessor field.
Step 3. The transformation will be applied at runtime, modifying the XML before further processing.
Example : Extracting Specific XML Nodes using XSL
Input XML
<root>
<items>
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
</items>
<metadata>
<timestamp>2025-01-01T12:00:00</timestamp>
</metadata>
</root>
XSLT for Extraction
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<filteredItems>
<xsl:copy-of select="//item"/>
</filteredItems>
</xsl:template>
</xsl:stylesheet>
Transformed Output
<filteredItems>
<item>
<id>1</id>
<name>Item A</name>
</item>
<item>
<id>2</id>
<name>Item B</name>
</item>
</filteredItems>
💡 Effect: The <metadata> section is removed, leaving only <item> nodes.
Example : Flattening XML Structure
This example flattens a nested XML structure by converting elements into attributes.
<order>
<customer>
<name>John Doe</name>
<email>john@example.com</email>
</customer>
<details>
<product>Gadget</product>
<price>99.99</price>
</details>
</order>
XSLT for Flattening
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<orderSummary>
<customer name="{order/customer/name}" email="{order/customer/email}"/>
<product name="{order/details/product}" price="{order/details/price}"/>
</orderSummary>
</xsl:template>
</xsl:stylesheet>
Transformed Output
<orderSummary>
<customer name="John Doe" email="john@example.com"/>
<product name="Gadget" price="99.99"/>
</orderSummary>
💡 Effect: The structure is flattened, making it easier to process in tabular formats.
Comments
0 comments
Please sign in to leave a comment.