Overview
JSON is widely used in APIs and event streams. Native support for nested objects and arrays.
When to use this Format
Etlworks can parse and create practically any JSON document. You can use additional parameters to hint the JSON parser on dealing with the JSON nodes and arrays when reading and creating JSON documents.
Use the JSON Format when configuring a source-to-destination transformation that reads or writes JSON documents.
Read how to work with nested JSON and XML files in Etlworks.
Create Format
To create a new fixed-length Format, go to Connections, select the Formats tab, click Add Format, type in json in the search field, and select JSON. Note that it is not the same as JSON Data Set.
Below are the available parameters:
- First Object Type: the type of the first object in the JSON. This parameter is used as a hint when reading JSON and as an instruction when writing JSON. Options:
- array: [].
- named object: {"root": []}.
- object: {"field":"value"}.
- variable: either array or object, depending on the number of records in the dataset.
- Root Object Name: the name of the first object when named object or variable is used as the First Object Type.
- Nested Object Type: the type of all nested objects in the JSON. This parameter is used as a hint when reading JSON and as an instruction when writing JSON. Options:
- array: [].
- object: {"field":"value"}.
- variable: either array or object, depending on the number of records in the dataset.
- Template: this is a template in JSON Format. If this field is not empty, Etlworks will use it to populate column names and data types. Template is an optional field. For example: {"firstName":"string","lastName":"string","age":23}.
- Column names compatible with SQL: convert column names to SQL-compatible column names by removing all characters, except alphanumeric and spaces.
- All fields are strings: if this option is enabled (it is disabled by default), the system will create all fields with a string data type.
- Date and Time Format: a Format for timestamps (date+time).
- Date Format: a Format for date (date only, no time).
- Time Format: a Format for time (time only, no date).
- Parse Dates: if this option is enabled, and the date or time value is not recognized as one of the Formats defined above, the connector will try to parse it using one of the well-known date+time Formats.
- Trim Strings: if this option is enabled, the connector will trim leading and trailing white spaces from the value.
- Treat 'null' as null: if this option is enabled, the connector will treat string values equal to null as actual nulls (no value).
- Enable key-value: enable this option to handle the special case when there is a single field-nested data set, for example, in different types of phone numbers. As a result, the nested data set will be created with an additional name_type field. For example (showing the fields only): phone_type, number.
- Preprocessor: JavaScript code that modifies the contents of the source document. Read more about how to use a Preprocessor.
Convert a YAML document to JSON on the fly
Overview
In some cases, you may need to work with YAML documents in Etlworks, but currently, Etlworks doesn’t natively support the YAML format. However, you can easily convert a YAML document into JSON on the fly during the ETL process. This allows you to leverage YAML as a data source while continuing to use JSON for data transformations within Etlworks.
This guide provides a step-by-step instruction for converting a source YAML document into JSON in real-time.
Steps to Convert YAML to JSON in Etlworks
Step 1: Create a JSON Format
To start, you need to create a JSON format in Etlworks. This format will be used as the target format after the YAML-to-JSON conversion. Follow these steps to create the JSON format:
- Navigate to Formats in the Etlworks UI.
- Click Add Format and select JSON as the format type.
- Configure the format as needed for your specific transformation and save it.
Step 2: Add a Preprocessor to Convert YAML to JSON
The key step in this process is using a preprocessor script to convert the incoming YAML data into JSON. You will need to add a preprocessor to your flow with the following code:
var Yaml = Java.type('com.github.javafaker.shaded.snakeyaml.Yaml');
var ObjectMapper = Java.type('com.fasterxml.jackson.databind.ObjectMapper');
// Create a new instance of the shaded Yaml parser
var yamlParser = new Yaml();
// Parse the YAML string into a Java Map
var yamlMap = yamlParser.load(message);
// Convert the Java Map to JSON using Jackson's ObjectMapper
var objectMapper = new ObjectMapper();
// Return JSON
value = objectMapper.writeValueAsString(yamlMap);
Explanation of the Preprocessor:
• Yaml: This uses the shaded version of SnakeYAML (bundled with JavaFaker) to parse the incoming YAML data.
• ObjectMapper: We use the Jackson ObjectMapper to convert the parsed YAML (in the form of a Java Map) into a JSON string.
• The message variable represents the incoming YAML document, and the result, value, is the converted JSON string.
Step 3: Use JSON Format in Transformations
Once you have the YAML converted to JSON via the preprocessor, you can use the previously created JSON format as the source format for any further transformations in your ETL flow.
- In your transformation step, select the JSON format you created in Step 1 as the source format.
- Continue with the transformation process as you normally would with a JSON input.