About nested datasets
The input and output of different web services and APIs can range from simple and flat to highly-nested and complex.
An example of a nested data object:
[
{
"firstName":"Duke",
"lastName":"Java",
"test":[
{
"who":"duke",
"why":"because"
}
],
"age":18,
"streetAddress":"100 Internet Dr",
"city":"JavaTown",
"state":"JA",
"postalCode":"12345",
"phoneNumbers":[
{
"type":"Mobile",
"phoneNumber":"111-111-1111"
},
{
"type":"Home",
"phoneNumber":"222-222-2222"
}
]
}
]
In relational databases and flat files (such as CSV, Excel, etc.), the data is represented as datasets with columns and rows:
EMPNO FIRSTNME MIDINIT LASTNAME WORKDEPT PHONENO
10 CHRISTINE I HAAS A00 3978
20 MICHAEL L THOMPSON B01 3476
30 SALLY A KWAN C01 4738
50 JOHN B GEYER E01 6789
60 IRVING F STERN D11 6423
70 EVA D PULASKI D21 7831
What if the PHONENO
column is an array of phones such as below?
"PHONENO":[
{
"type":"Mobile",
"phoneNumber":"111-111-1111"
},
{
"type":"Home",
"phoneNumber":"222-222-2222"
}
]
As you can see below, this column can be easily represented as a dataset:
type phoneNumber
mobile 111-111-1111
home 222-222-2222
Now, if we say that the PHONENO
column has an ARRAY
data type and each PHONENO
value in a row is a dataset, we have found a way to encode nested data objects.
This is exactly how nested data objects are represented in Etlworks Integrator.
There is no limit to nestedness (or numbers of levels) in Etlworks Integrator, so each column anywhere in the nested object can be a nested object itself.
In Etlworks Integrator, the inner datasets (PHONENO
in the example above) are also called dimensions. The outer dataset (the root nested object) is also called a driving dataset.
Disable streaming when working with nested datasets
Etlworks Integrator can read and write nested data objects, which are encoded in all the supported data exchange Formats, such as JSON, XML, HL7, etc.
By default, Etlworks Integrator attempts to use streaming, which is another way of saying, "please keep as little data in memory as possible." Generally speaking, streaming is faster and has a very low memory footprint. Unfortunately, streaming is not always possible, especially when dealing with complex nested data objects, which are typically returned by web services. In this case, it is always a good idea to disable the streaming.
Step 1. Create a transformation.
Step 2. Select the Parameters
tab.
Step 3. Uncheck Stream Data
.
Work with nested datasets
- Using SQL to extract data from the nested dataset
- The Source is a nested dataset
- The Source is a nested XML
- The Source is a nested dataset, and the destination is a flat file or a database table
- Normalizing a nested dataset into multiple flat files or database tables
- The Destination is a nested dataset
Comments
0 comments
Please sign in to leave a comment.