Overview
Automatic partitioning lets a flow extract a large source table in parallel chunks. You define Partition SQL — a query that returns one row per partition with the parameters Etlworks should use for that partition — and Etlworks generates a separate transformation per row, then runs them in parallel.
What is automatic partitioning?
Partition SQL is a SQL query that defines the conditions for partitioning data. When Partition SQL is set, the flow generates one ETL transformation per partition returned by the query and runs them concurrently.
Partition SQL does not have to use actual database partitions. It can be any SQL that produces start / end conditions (for example, date or numeric ranges) for extracting data in chunks. The flow processes each chunk in parallel without being limited to predefined database partitions.
Why use partitioning?
- Parallel processing — chunks are processed concurrently, which speeds up extraction.
- Handles large datasets — the flow works on smaller, manageable pieces instead of one giant table.
- Flexible partitioning strategy — custom partitioning conditions through Partition SQL.
- Better resource utilization — workload is distributed across multiple threads or servers.
Which flows support automatic partitioning?
Any ETL flow where the source is a relational database.
How do I set up automatic partitioning?
- Create a flow where the source is a relational database and add a source-to-destination transformation.
- Click Configure, select the Parameters tab, and enter Partition SQL.
- Enable Use Parallel Threads when processing sources by a wildcard for faster processing.
Configure other parameters such as Action and Lookup fields.
- Select the Mapping tab and enter Source query.
Why is the Source query required?
The Source query is a template that adapts to the parameters returned by the Partition SQL for each partition.
- Partition-specific extraction. Each partition represents a distinct segment of the dataset (date range, numeric range, partition number, …). The Source query is parameterized with values from the Partition SQL so each transformation extracts only its slice of the data.
- Handles arbitrary partitioning strategies. Because Partition SQL doesn't have to use physical partitions, the Source query must adapt — for example by substituting dynamic start / end dates.
- Drives parallel execution. One transformation per partition; each targets its own subset. Without a partition-specific Source query, partitions could overlap or miss data.
- Ensures coverage and consistency. Partition tokens like {PartitionNumber}, {StartDate}, {EndDate} are substituted into the Source query so each row is processed exactly once.
Partition-specific Source query examples
Based on partition number
For databases that use partition numbers, use the $PARTITION function to target a single partition:
SELECT * FROM Orders
WHERE $PARTITION.OrderDateRangePF(OrderDate) = {PartitionNumber};
Based on date ranges
SELECT * FROM Orders
WHERE OrderDate >= '{StartDate}' AND OrderDate < '{EndDate}';
Partitioning examples by database
The Partition SQL returns partition-specific parameters that are passed to the Source query for data extraction. The examples below cover both partitioning styles:
- Physical partitions are database-managed segments defined at the schema level (range, list, hash). Queries can target a specific partition for better performance.
- Logical partitioning divides the data based on user-defined conditions (date ranges, numeric ranges, …) and is implemented at the ETL level. It does not require any database-level partitioning and is the most flexible option.
Oracle — physical partitions
Partition SQL — retrieves partition information for a table partitioned by date:
SELECT partition_name AS PartitionName, high_value AS HighValue FROM user_tab_partitions WHERE table_name = 'ORDERS';
Source query — extracts data from the named partition:
SELECT * FROM Orders PARTITION ({PartitionName});
SQL Server — physical partitions
Partition SQL:
SELECT p.partition_number AS PartitionNumber, prv.value AS HighBoundaryValue FROM sys.partitions p JOIN sys.partition_range_values prv ON p.partition_number = prv.boundary_id JOIN sys.tables t ON p.object_id = t.object_id WHERE t.name = 'Orders' AND p.index_id IN (0, 1);
Source query — builds a dynamic query that filters by partition number:
DECLARE @PartitionNumber INT = {PartitionNumber};
DECLARE @TableName NVARCHAR(128) = 'Orders';
DECLARE @PartitionFunction NVARCHAR(128) = 'OrderDateRangePF';
DECLARE @PartitionColumn NVARCHAR(128);
DECLARE @SQL NVARCHAR(MAX);
SELECT TOP 1 @PartitionColumn = c.name
FROM sys.index_columns ic
JOIN sys.columns c
ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE ic.partition_ordinal > 0
AND OBJECT_NAME(ic.object_id) = @TableName;
IF @PartitionColumn IS NULL
BEGIN
RAISERROR ('No partition column found for the specified table.', 16, 1);
RETURN;
END
SET @SQL = N'SELECT * FROM ' + QUOTENAME(@TableName) +
N' WHERE $PARTITION.' + QUOTENAME(@PartitionFunction) +
N'(' + QUOTENAME(@PartitionColumn) + N') = ' +
CAST(@PartitionNumber AS NVARCHAR(10));
EXEC sp_executesql @SQL;
MySQL — physical partitions
Partition SQL:
SELECT PARTITION_NAME AS PartitionName FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 'Orders';
Source query:
SELECT * FROM Orders PARTITION ({PartitionName});
PostgreSQL — physical partitions
Partition SQL — lists child partitions of a range-partitioned table:
SELECT inhrelid::regclass AS PartitionTableName FROM pg_inherits WHERE inhparent = 'Orders'::regclass;
Source query — queries the child partition directly:
SELECT * FROM {PartitionTableName};
DB2 — physical partitions
SELECT PARTITION_NUMBER AS PartitionNumber, LIMITKEY AS HighValue FROM SYSCAT.DATAPARTITIONS WHERE TABNAME = 'ORDERS';
SELECT * FROM Orders WHERE MOD(OrderID, {PartitionNumber}) = 0;
AS400 — physical partitions
SELECT PARTITION_NUMBER AS PartitionNumber, PARTITION_NAME AS PartitionName FROM QSYS2.SYSTABLEPARTITIONS WHERE TABLE_NAME = 'ORDERS';
SELECT * FROM Orders WHERE PartitionColumn = {PartitionNumber};
Informix — physical partitions
SELECT partnum AS PartitionNumber FROM sysfragments WHERE tabname = 'Orders';
SELECT * FROM Orders WHERE PARTITION BY (OrderID) = {PartitionNumber};
Logical partitioning examples
Oracle — logical partitioning
SELECT TO_DATE('2022-01-01', 'YYYY-MM-DD') AS start_date,
TO_DATE('2022-02-01', 'YYYY-MM-DD') AS end_date
FROM DUAL
UNION ALL
SELECT TO_DATE('2022-02-01', 'YYYY-MM-DD'),
TO_DATE('2022-03-01', 'YYYY-MM-DD')
FROM DUAL
UNION ALL
SELECT TO_DATE('2022-03-01', 'YYYY-MM-DD'),
TO_DATE('2022-04-01', 'YYYY-MM-DD')
FROM DUAL;
SELECT * FROM Orders
WHERE OrderDate >= {start_date} AND OrderDate < {end_date};
SQL Server — logical partitioning
WITH NumberRanges AS ( SELECT 1 AS start_id, 1000 AS end_id UNION ALL SELECT 1001, 2000 UNION ALL SELECT 2001, 3000 ) SELECT start_id, end_id FROM NumberRanges;
SELECT * FROM Orders WHERE OrderID BETWEEN {start_id} AND {end_id};
MySQL — logical partitioning
SELECT '2022-01-01' AS start_date, '2022-01-31' AS end_date UNION ALL SELECT '2022-02-01', '2022-02-28' UNION ALL SELECT '2022-03-01', '2022-03-31';
SELECT * FROM Orders
WHERE OrderDate >= '{start_date}' AND OrderDate < '{end_date}';
PostgreSQL — logical partitioning
SELECT generate_series('2022-01-01'::date, '2022-03-01'::date, '1 month') AS start_date,
(generate_series + '1 month'::interval - '1 day'::interval)::date AS end_date;
SELECT * FROM Orders
WHERE OrderDate >= '{start_date}' AND OrderDate < '{end_date}';
DB2 — logical partitioning
SELECT '2022-01-01' AS start_date, '2022-01-31' AS end_date FROM SYSIBM.SYSDUMMY1 UNION ALL SELECT '2022-02-01', '2022-02-28' FROM SYSIBM.SYSDUMMY1;
SELECT * FROM Orders WHERE OrderDate BETWEEN '{start_date}' AND '{end_date}';
AS400 — logical partitioning
SELECT DATE('2022-01-01') AS start_date, DATE('2022-01-31') AS end_date FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT DATE('2022-02-01'), DATE('2022-02-28') FROM SYSIBM.SYSDUMMY1;
SELECT * FROM Orders WHERE OrderDate BETWEEN '{start_date}' AND '{end_date}';
Informix — logical partitioning
SELECT 1 AS start_id, 1000 AS end_id FROM systables WHERE tabid = 1 UNION ALL SELECT 1001, 2000 FROM systables WHERE tabid = 1;
SELECT * FROM Orders WHERE OrderID BETWEEN {start_id} AND {end_id};
Using partition parameters in source queries and transformations
Columns returned by the Partition SQL can be referenced anywhere in the source query or in the TO part of the source-to-destination transformation using {ColumnName}. For example, if Partition SQL returns StartDate and EndDate:
SELECT * FROM Orders
WHERE OrderDate >= '{StartDate}' AND OrderDate < '{EndDate}';
You can also use these tokens to name output files dynamically:
orders_{EndDate}.csv
The partition index is available as {PartitionId} — a 1-based number representing the row returned by the Partition SQL. Useful for uniquely identifying or naming partitions.