When to use this format
Etlworks Integrator can automatically parse and create practically any XML document. You can use additional parameters to hint the XML parser how to deal with the XML nodes and attributes when reading and creating XML documents.
Use JSON format when configuring a source-to-destination transformation that reads or writes JSON documents.
Read how to work with nested JSON and XML files in Etlworks.
Creating format
To create a new XML format, go to Connections, select Formats tab, click Add Format button, type in xml in the search field and select XML. Note that it is not the same as an XML Data Set.
Below are the available parameters:
- XML Document - when this option is selected (the default), the parser automatically adds the root tag
xml
to the document. Always select it if you want your XML documents to be created with an encoding different from UTF-8.
-
<?xml version="1.0" encoding="UTF-8"?>
- Start Element Name - the value of this parameter is used as an XML root element.
When creating XML document, the Start Element Name is used as a root node. For example, when the Start Element Name is set to Patient
:
-
<Patient> the rest of the xml document </Patient>
When reading XML documents, the Start Element Name is used as a hint for the XML parser to automatically create a nested data set with the name equal to the value of this field.
- XML Version - the XML standard version. The default is
1.0
. - Namespaces - all applicable XML namespaces. Namespaces can only be set if Start Element Name is not empty. For example, when the Start Element Name is set to
Patient
and Namespaces is set toxmlns="http://www.meditlive.com/datacontract xmlns:i="http://www.w3.org/2001/XMLSchema-instance"
:
-
<Patient xmlns="http://www.meditlive.com/datacontract" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> the rest of the xml document </Patient>
- Row Tag - the tag used for repeating rows.
When reading XML like below:
-
<Patients> <Patient> <first_name>Joe</first_name> <last_name>Doe</last_name> <dob>01/01/2001</dob> </Patient> .... <Patients>
By default, the system creates nested data sets for each patient "row". If you wish to "flatten" the data set you can set the Row Tag to patient. As a result, the flat data set with the following rows and columns will be created:
-
first_name,last_name,dob joe,doe,01/01/2001 .... jane,doe,01/02/2002
When creating a XML document, the value of the Row Tag field is used literally - to create new repeating XML nodes. For example, with the Row Tag set to ROW
:
-
<Patients> <ROW> </ROW> <ROW> </ROW> <ROW> </ROW> <Patients>
- Null Attribute - if set, a tag with a
null
value will be created using this attribute.
For example (with no Null Attribute):
-
<MiddleName />
Another example (with the Null Attribute set to i:nil="true"
):
-
<MiddleName i:nil="true"/>
- Value Attribute - if set, the values are set using specific attributes.
For example (with no Value Attribute):
-
<FirstName>Joe</FirstName>
Another example (with the Value Attribute set to value
):
-
<FirstName value="Joe"/>
- Template - a template in the XML format. If this field is not empty, Integrator will use it to populate column names and data types. Template is an optional field. For example:
<root><FirstName/><LastName/><Dob/></root>
. - Column names compatible with SQL - convert column names to SQL-compatible column names by removing all characters, except alphanumeric and spaces.
- Parse XML Attributes - Enabling this option will force the system to parse XML attributes:
<tag attr="value">
. - Parse XML Attributes in Root Node - Disable this flag if you don't want the system to parse attributes in the root XML node. This feature only works when Parse XML Attributes is enabled. An example of the attributes in the root node:
<data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1">
- Parse CDATA section - disable this flag if you know for sure that there are no CDATA sections in the XML to parse. This will improve parsing performance.
- Parse Comments - If this option is enabled, the system will add a field with a suffix
-comments
for each node that has comments. - Transformation type - the default is
query
, which is equal to applying an XSL Style Sheet, but you can also selectpreprocessor
. Read more about using a preprocessor to modify content of the source document. - XSL Style Sheet or Preprocessor - an XSL stylesheet can transform an XML document into the desired format. An XSL stylesheet is applied when reading and writing XML documents. Read more about using XSL stylesheets to transform XML.
- Encoding - XML document encoding. The default is
UTF-8
. - All fields are strings - If this option is enabled (it is disabled by default) the system will create all fields with a string data type.
- Date and Time Format - a format for timestamps (date+time).
- Date Format - a format for date (date only, no time).
- Time Format - a format for time (time only, no date).
- Parse Dates - if this option is enabled, and date or time value is not recognized as one of the formats defined above, Integrator will try to parse it using one of the well know date+time formats.
- Trim Strings - if this option is enabled, Integrator will trim leading and trailing white-spaces from the value.
- Treat 'null' as null - if this option is enabled, Integrator will treat string values equal to 'null' as actual nulls (no value).
Comments
0 comments
Please sign in to leave a comment.