When to use this format
Etlworks can read and write Parquet files, including nested Parquet files.
To create a new Parquet format, go to Connections, select the Formats tab, click the Add Format button and type parquet in the search field.
The following parameters are available when configuring the Parquet format:
- Compression Codec - the compression algorithm used when creating Parquet files. You don't need to select the algorithm if all your files are uncompressed or you are only reading the Parquet files.
- Normalize nested records with one field - if this option is enabled, (it is disabled by default) the Parquet parser will create less nested datasets when reading the nested Parquet files with an array containing only one field.
- Column names compatible with SQL - this converts column names to SQL compatible column names by removing all characters except alphanumeric and spaces.
- Treat 'null' as null - if this option is enabled, Integrator will treat string values equal to 'null' as actual nulls (no value).
- Trim Strings - if this option is enabled, Integrator will trim leading and trailing white-spaces from the value.
- Schema - the schema is used to create Parquet files. You can leave this field empty if you are only reading the Parquet files.