When to use this Format
Apache Parquet is a columnar storage Format available to any project in the Hadoop ecosystem. Read about Parquet Format.
Etlworks can read and write Parquet files, including nested Parquet files.
Use Parquet Format when configuring a source-to-destination transformation that reads or writes Parquet documents. The files in the Parquet Format can also be used to load data in Snowflake.
Process
To create a new Parquet Format, go to Connections
, select the Formats
tab, click Add Format
, and type parquet
in the Search
field.
The following parameters are available when configuring the Parquet Format:
Compression Codec
: the compression algorithm used when creating Parquet files. You don't need to select the algorithm if all your files are uncompressed or you are only reading the Parquet files.Normalize nested records with one field
: if this option is enabled, (it is disabled by default) the Parquet parser will create less nested datasets when reading the nested Parquet files with an array containing only one field.Column names compatible with SQL
: this converts column names to SQL compatible column names by removing all characters except alphanumeric and spaces.Treat 'null' as null
: if this option is enabled, Etlworks will treat string values equal to 'null' as actual nulls (no value).Trim Strings
: if this option is enabled, Etlwortks will trim leading and trailing white spaces from the value.SchemaEnter
: the schema is used to create Parquet files. You can leave this field empty if you are only reading the Parquet files.
Comments
0 comments
Please sign in to leave a comment.