Once you’re familiar with the Mapping Editor, this article goes deeper into field-level behavior, rules, and flow-wide options that affect how mappings work across your transformations.
Overview
In Etlworks, each source-to-destination transformation can include a mapping that controls how data moves from one structure to another.
This mapping can include:
- Direct field-to-field transfers
- Calculated fields
- Excluded fields
- New fields that don’t exist in the source
- Nested fields for hierarchical formats (JSON, XML)
While the Mapping Editor provides the interface, this article explains how mappings behave, how defaults work, and how to control field handling at the flow level.
Default Mapping Behavior
If you don’t explicitly map a field, Etlworks still tries to help.
When no mapping is defined at all
If the Mapping grid is empty:
- All source fields are copied to the destination as-is.
- Field names must match for this to work properly.
- Destination must support auto-mapping (e.g., file, API, most databases).
This is ideal for flat, consistent schemas.
When some mappings are defined
If you’ve added a few field mappings manually:
- Unmapped source fields are still automatically added to the destination.
- You must explicitly exclude any fields you don’t want to include.
This behavior ensures schema evolution doesn’t silently drop fields — unless you ask it to.
Exceptions
Default mapping does not apply when:
- You’re working with nested data structures (JSON, XML).
- You’ve configured the flow to ignore or error on unmapped fields (see below).
Flow-Level Field Handling Settings
Global options in the Flow’s Parameters tab let you control how all mappings behave across transformations.
Error if Mapped Field is Missing in Source
- If enabled: the flow fails if a mapped field doesn’t exist in the source.
- If disabled (default): a warning is logged, and the field is skipped.
Use this when:
- You want to enforce a strict source schema.
- You expect all mapped fields to be present.
Handling Extra Fields in Source
Controls how source fields not present in the Mapping are handled.
| Option | Behavior |
|---|---|
| Automatic (or not set) | Unmapped fields are included |
| Ignore extra fields | Unmapped fields are skipped entirely. |
| Error on extra fields | Flow fails before execution if unexpected fields are found. |
Use this to:
- Detect schema drift early.
- Enforce clean, predictable transformations.
- Avoid downstream issues from unexpected data.
Manual Field Adjustments
Beyond the field pairings you define in the Mapping Editor, you can:
Add fields to the destination that don’t exist in the source
- Enable the Add Field checkbox
- Set a default or calculated value using the Field Function Editor
- Useful for constants, metadata, and calculated columns
Exclude fields
- Use the Exclude checkbox to remove a field from the destination
- Applies to default-mapped and explicitly mapped fields
Change the field data type
By default, the destination field gets its data type from the source field. To change the destination data type, click the Data type button, select the Data type from the drop-down, and if needed, enter a Database specific type (for example, varchar(255)). You can also specify if the field is nullable.
Apply transformations
- Use JavaScript, Python, or SQL (if supported by the destination) to define field value logic
Example:
{"Last Name"} + ", " + {"First Name"}Using scripting languages to calculate field's value
- How to use JavaScript or Python to calculate a field's value.
- Examples of calculating field value using the scripting language
- How to use native SQL to calculate a field's value.
- How to use SQL to Lookup Fields' value.
Column name sanitization
Three per-connector options control how column names are sanitized when the engine reads source data. All three are off by default — turn them on when the destination requires it. Settings are per-format (for file based connections and APIs) or per connection (for database connections), so a single flow can apply different rules on each side.
Column names compatible with SQL
Replaces characters that are not valid in SQL identifiers — anything other than alphanumerics and underscores — with safe substitutes. Use this when the destination is a database and the source columns may contain spaces, dashes, or other special characters that would otherwise need quoting in SQL.
Remove dots from column names
Some downstream systems (for example, certain databases or warehouses) cannot accept column names containing the dot character (.). For flows that target those systems, enable the Remove dots setting on the source format to strip dots from column names automatically.
- Default: off — dotted names are preserved. This matters when the dots are meaningful (for example, when flattening a nested field into a single column like orders.items.price).
- When on: dots are removed during metadata extraction. The sanitized names appear in the source tree of the mapping editor.
The setting is per-format, so a single flow can preserve dots on one side and strip them on the other.
Change column name using JavaScript
For renames beyond the deterministic options above — pattern-based replacement, prefix or suffix, case conversion, or lookups — use a JavaScript expression to compute column names. The script runs once per column at metadata-extraction time.
Bulk Column Name Transformation →
Reorder fields
Drag rows up/down to reorder fields in the output.
Selecting Source and Destination Fields
If the source or destination is connected to a live Connection:
- The field dropdowns are auto-populated with metadata.
- If fields don’t appear, click Clear Cache at the bottom of the screen to reload.
You can always manually type in field names if:
- The metadata is not available
- You’re using a template
- The object is dynamic
Tip: You can also define field templates for faster reuse.
Working with nested datasets
When the source, the destination, or both are hierarchical (JSON, XML, etc.), you have two paths:
Option 1: Continue with flat mapping
Stay in Flat view mode and use flattening techniques on the source side. Flat mapping keeps streaming enabled, runs at load time, and is significantly faster on large volumes. Best when the destination is flat and the source can be flattened with one of the supported techniques.
- Mapping when the source is nested and destination is flat — the supported use cases for flat mapping with a nested source.
- Techniques for working with complex nested datasets — flattening patterns, normalization tasks, and other approaches.
Option 2: Switch to nested mapping
Toggle the mapping editor into Nested view mode. This is the unified mechanism for any transformation that requires explicit control over hierarchy:
- Flat → Nested — build a nested document from flat records.
- Nested → Flat — flatten a nested document with explicit field-level control.
- Nested → Nested — restructure a nested document into a different nested shape, including denormalization.
Nested mode renders both source and destination as expandable trees and unlocks indent/outdent, add child / add sibling, drag-to-reorder within the hierarchy, and per-row data-type modifiers for JSON and XML. Streaming is disabled in this mode — enable Force Streaming when the destination is a database.
Read more about nested mapping →
Special Case: Duplicate Destination Fields
In file formats like CSV or Excel, you can map multiple fields to destination columns with the same name. This is useful for:
- Flattening structures
- Repeating fields
- Structuring tabular outputs
Note: Not supported for relational databases or cloud data warehouses.
Handling null, empty strings, and the literal 'null'
Overview
This article explains how to consistently handle null values (null), empty strings (""), and the string "null" across ETL, CDC, and bulk load flows in Etlworks.
It covers:
-
How different connectors interpret and write nulls
-
Configuration options available in file-based and CDC connectors
-
Special considerations for cloud data warehouses like Snowflake and Redshift
-
Best practices for achieving consistent behavior end-to-end
When This Applies
These settings apply to flows that:
-
Read from files (CSV, JSON, fixed-width, etc.)
-
Write to files
-
Create intermediate files for bulk loading into cloud warehouses (Snowflake, Redshift, etc.)
-
Use CDC connectors that stream changes to files or cloud stages
They do not apply to direct transformations where data is transferred from source to destination (e.g., database to database) without any intermediate file.
File-Based and CDC Connector Options
Most file-based connectors (CSV, JSON, fixed-width, Excel) and CDC connectors support the following configuration options for handling nulls:
Convert 'null' to null when reading
- If enabled, the string 'null' (case-sensitive) in input files is interpreted as an actual null.
- Available in: file-based connectors (CSV, JSON, fixed-width, etc.) and CDC connectors (when wring to file).
- Default: false
Convert empty string to null when reading
- If enabled, "" (empty string) is converted to null when reading.
- Also: when writing, actual nulls will be written as "".
- Available in: file-based connectors (CSV, JSON, fixed-width, etc.)
- Default: true (CSV), false (others)
Value for null when reading
-
Accepts comma-separated values (e.g., NULL,null,-)
-
If specified, any of these values will be treated as null.
-
Overrides other null-handling options during reading.
- Available in: file-based connectors (CSV, JSON, fixed-width, etc.)
Value for null when writing
-
If specified, all nulls will be written using the given value.
-
Example: writing NULL or n/a instead of an empty field.
-
Overrides all other write-time null-handling options.
- Available in: file-based connectors (CSV, JSON, fixed-width, etc.) and CDC connectors (when writing to file)
Do not enclose null in double quotes (CDC only)
-
If enabled, the string 'null' will not be enclosed in double quotes when written to file (i.e., null instead of "null").
-
Important:
-
This setting is ignored if Value for null when writing is specified.
-
Also ignored when CDC writes directly to a destination (e.g., database, message queue).
-
-
Use case:
-
Particularly useful when loading into Greenplum and other databases that treat "null" as a literal string rather than SQL null.
-
Null Handling in Bulk Load Pipelines
When using bulk load pipelines (e.g., database → Snowflake), Etlworks generates intermediate files before loading data. This means:
All null-handling options above apply during:
-
File generation
-
File loading
However, some warehouses require additional or specific settings at load time:
Snowflake-Specific Settings
Snowflake’s bulk loader supports a setting called: String used to convert to and from SQL NULL
This setting must match the value used in Etlworks for Value for null when writing to ensure correct conversion.
Example:
'DATABASE_NULL_VALUE'
Redshift-Specific Settings
Redshift's bulk loader supports a setting called: String used to convert to SQL NULL
This must also be aligned with Etlworks null writing logic. If mismatched, Redshift may load "NULL" or "" as literal strings instead of actual nulls.
Example:
DATABASE_NULL_VALUE
Other Warehouses and Formats
Some systems (BigQuery, Synapse, etc.) may require explicit handling of null markers such as:
-
Setting "\\N" for nulls
-
Avoiding empty strings for numeric/date fields
-
Escaping special null indicators in delimited files
Always consult the target system’s file format requirements and configure Etlworks accordingly using the Value for null when writing setting.
Best Practices
-
Be consistent across read and write: use the same null indicators both ways.
-
Use Value for null when reading to handle legacy or non-standard data (e.g., n/a, -).
-
Use Value for null when writing when the target system expects a specific marker.
-
Use the warehouse-specific null settings (Snowflake, Redshift, etc.) to align with your file configuration.
-
Avoid relying on defaults when working with bulk pipelines — always set null behavior explicitly.