Parquet Format – Etlworks Support

Overview

Parquet is a columnar binary format optimized for nested data and analytics workloads.

Etlworks can read and write Parquet files, including nested Parquet files.

Use Parquet Format when configuring a source-to-destination transformation that reads or writes Parquet documents. The files in the Parquet Format can also be used to load data in Snowflake.

To create a new Parquet Format, go to Connections, select the Formats tab, click Add Format, and type parquet in the Search field.

The following parameters are available when configuring the Parquet Format:

Compression Codec: the compression algorithm used when creating Parquet files. You don't need to select the algorithm if all your files are uncompressed or you are only reading the Parquet files.
Normalize nested records with one field: if this option is enabled, (it is disabled by default) the Parquet parser will create less nested datasets when reading the nested Parquet files with an array containing only one field.
Column names compatible with SQL: this converts column names to SQL compatible column names by removing all characters except alphanumeric and spaces.
Treat 'null' as null: if this option is enabled, Etlworks will treat string values equal to 'null' as actual nulls (no value).
Trim Strings: if this option is enabled, Etlwortks will trim leading and trailing white spaces from the value.
SchemaEnter: the schema is used to create Parquet files. You can leave this field empty if you are only reading the Parquet files.