Overview
In Etlworks, you can split the datasets into multiple files based on the value of a column.
Use Partition By
transformation to split a larger dataset into smaller partitions.
This transformation is only available when the destination is a file.
There are two options for partitioning:
Process
Step 1. To configure a Partition By
transformation, go to Transformation
/ MAPPING
/ Complex Transformations
/ Partition
.
Step 2. Enter the partition criteria in the Partition By
field.
Split by the maximum number of records in the partition
If you enter the numeric value in the field Partition By
, the transformation assumes that this value is the maximum number of records in a file. For example, if Partition By
is set to 100
and the dataset has 1,000 records, then 10 files, each containing 100 records, will be created.
The created files will have the following names: original filename
+ _
+ index + original file extension
.
Split by unique values of partition-by fields
If you enter the alphanumeric value in the field Partition By
, the transformation assumes that this value is a comma-separated list of columns to group by. For example, if Partition By
is set to last_name,first_name
, and the dataset has records with multiple, identical last and first names, then multiple files will be created, each file having records with the same unique combination of first and last names.
The created files will have the following names: original filename
+ _
+ value of the columns to partition by
+ original file extension
.
Ignore the original filename
This transformation creates files with names that include the original destination filename, for example, order_1234.csv
and order_4567.csv
, where 1234
and 4567
are the unique values of the orderId
column set in Partition By
. You can configure the transformation to ignore the original file name by enabling the property Transformation
/ MAPPING
/ Complex Transformations
/ Ignore Original File Name
.
Using the example above, the files with the following names will be created: _1234.csv
and _4567.csv
Comments
0 comments
Please sign in to leave a comment.