Automatic partitioning – Etlworks Support

Overview

Automatic partitioning lets a flow extract a large source table in parallel chunks. You define Partition SQL — a query that returns one row per partition with the parameters Etlworks should use for that partition — and Etlworks generates a separate transformation per row, then runs them in parallel.

What is automatic partitioning?

Partition SQL is a SQL query that defines the conditions for partitioning data. When Partition SQL is set, the flow generates one ETL transformation per partition returned by the query and runs them concurrently.

Partition SQL does not have to use actual database partitions. It can be any SQL that produces start / end conditions (for example, date or numeric ranges) for extracting data in chunks. The flow processes each chunk in parallel without being limited to predefined database partitions.

Why use partitioning?

Parallel processing — chunks are processed concurrently, which speeds up extraction.
Handles large datasets — the flow works on smaller, manageable pieces instead of one giant table.
Flexible partitioning strategy — custom partitioning conditions through Partition SQL.
Better resource utilization — workload is distributed across multiple threads or servers.

Which flows support automatic partitioning?

Any ETL flow where the source is a relational database.

How do I set up automatic partitioning?

Create a flow where the source is a relational database and add a source-to-destination transformation.
Click Configure, select the Parameters tab, and enter Partition SQL.
Enable Use Parallel Threads when processing sources by a wildcard for faster processing.

Configure other parameters such as Action and Lookup fields.
Select the Mapping tab and enter Source query.

Why is the Source query required?

The Source query is a template that adapts to the parameters returned by the Partition SQL for each partition.

Partition-specific extraction. Each partition represents a distinct segment of the dataset (date range, numeric range, partition number, …). The Source query is parameterized with values from the Partition SQL so each transformation extracts only its slice of the data.
Handles arbitrary partitioning strategies. Because Partition SQL doesn't have to use physical partitions, the Source query must adapt — for example by substituting dynamic start / end dates.
Drives parallel execution. One transformation per partition; each targets its own subset. Without a partition-specific Source query, partitions could overlap or miss data.
Ensures coverage and consistency. Partition tokens like {PartitionNumber}, {StartDate}, {EndDate} are substituted into the Source query so each row is processed exactly once.

Partition-specific Source query examples

Based on partition number

For databases that use partition numbers, use the $PARTITION function to target a single partition:

SELECT * FROM Orders
WHERE $PARTITION.OrderDateRangePF(OrderDate) = {PartitionNumber};

Based on date ranges

SELECT * FROM Orders
WHERE OrderDate >= '{StartDate}' AND OrderDate < '{EndDate}';

Partitioning examples by database

The Partition SQL returns partition-specific parameters that are passed to the Source query for data extraction. The examples below cover both partitioning styles:

Physical partitions are database-managed segments defined at the schema level (range, list, hash). Queries can target a specific partition for better performance.
Logical partitioning divides the data based on user-defined conditions (date ranges, numeric ranges, …) and is implemented at the ETL level. It does not require any database-level partitioning and is the most flexible option.

Oracle — physical partitions

Partition SQL — retrieves partition information for a table partitioned by date:

SELECT partition_name AS PartitionName, high_value AS HighValue
FROM user_tab_partitions
WHERE table_name = 'ORDERS';

Source query — extracts data from the named partition:

SELECT * FROM Orders PARTITION ({PartitionName});

SQL Server — physical partitions

Partition SQL:

SELECT
  p.partition_number AS PartitionNumber,
  prv.value          AS HighBoundaryValue
FROM sys.partitions p
JOIN sys.partition_range_values prv
  ON p.partition_number = prv.boundary_id
JOIN sys.tables t
  ON p.object_id = t.object_id
WHERE t.name = 'Orders'
  AND p.index_id IN (0, 1);

Source query — builds a dynamic query that filters by partition number:

DECLARE @PartitionNumber INT = {PartitionNumber};
DECLARE @TableName NVARCHAR(128) = 'Orders';
DECLARE @PartitionFunction NVARCHAR(128) = 'OrderDateRangePF';
DECLARE @PartitionColumn NVARCHAR(128);
DECLARE @SQL NVARCHAR(MAX);

SELECT TOP 1 @PartitionColumn = c.name
FROM sys.index_columns ic
JOIN sys.columns c
  ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE ic.partition_ordinal > 0
  AND OBJECT_NAME(ic.object_id) = @TableName;

IF @PartitionColumn IS NULL
BEGIN
  RAISERROR ('No partition column found for the specified table.', 16, 1);
  RETURN;
END

SET @SQL = N'SELECT * FROM ' + QUOTENAME(@TableName) +
           N' WHERE $PARTITION.' + QUOTENAME(@PartitionFunction) +
           N'(' + QUOTENAME(@PartitionColumn) + N') = ' +
           CAST(@PartitionNumber AS NVARCHAR(10));

EXEC sp_executesql @SQL;

MySQL — physical partitions

Partition SQL:

SELECT PARTITION_NAME AS PartitionName
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'Orders';

Source query:

SELECT * FROM Orders PARTITION ({PartitionName});

PostgreSQL — physical partitions

Partition SQL — lists child partitions of a range-partitioned table:

SELECT inhrelid::regclass AS PartitionTableName
FROM pg_inherits
WHERE inhparent = 'Orders'::regclass;

Source query — queries the child partition directly:

SELECT * FROM {PartitionTableName};

DB2 — physical partitions

SELECT PARTITION_NUMBER AS PartitionNumber, LIMITKEY AS HighValue
FROM SYSCAT.DATAPARTITIONS
WHERE TABNAME = 'ORDERS';

SELECT * FROM Orders WHERE MOD(OrderID, {PartitionNumber}) = 0;

AS400 — physical partitions

SELECT PARTITION_NUMBER AS PartitionNumber, PARTITION_NAME AS PartitionName
FROM QSYS2.SYSTABLEPARTITIONS
WHERE TABLE_NAME = 'ORDERS';

SELECT * FROM Orders WHERE PartitionColumn = {PartitionNumber};

Informix — physical partitions

SELECT partnum AS PartitionNumber
FROM sysfragments
WHERE tabname = 'Orders';

SELECT * FROM Orders WHERE PARTITION BY (OrderID) = {PartitionNumber};

Logical partitioning examples

Oracle — logical partitioning

SELECT TO_DATE('2022-01-01', 'YYYY-MM-DD') AS start_date,
       TO_DATE('2022-02-01', 'YYYY-MM-DD') AS end_date
FROM DUAL
UNION ALL
SELECT TO_DATE('2022-02-01', 'YYYY-MM-DD'),
       TO_DATE('2022-03-01', 'YYYY-MM-DD')
FROM DUAL
UNION ALL
SELECT TO_DATE('2022-03-01', 'YYYY-MM-DD'),
       TO_DATE('2022-04-01', 'YYYY-MM-DD')
FROM DUAL;

SELECT * FROM Orders
WHERE OrderDate >= {start_date} AND OrderDate < {end_date};

SQL Server — logical partitioning

WITH NumberRanges AS (
  SELECT 1    AS start_id, 1000 AS end_id
  UNION ALL
  SELECT 1001, 2000
  UNION ALL
  SELECT 2001, 3000
)
SELECT start_id, end_id FROM NumberRanges;

SELECT * FROM Orders WHERE OrderID BETWEEN {start_id} AND {end_id};

MySQL — logical partitioning

SELECT '2022-01-01' AS start_date, '2022-01-31' AS end_date
UNION ALL
SELECT '2022-02-01', '2022-02-28'
UNION ALL
SELECT '2022-03-01', '2022-03-31';

SELECT * FROM Orders
WHERE OrderDate >= '{start_date}' AND OrderDate < '{end_date}';

PostgreSQL — logical partitioning

SELECT generate_series('2022-01-01'::date, '2022-03-01'::date, '1 month') AS start_date,
       (generate_series + '1 month'::interval - '1 day'::interval)::date AS end_date;

SELECT * FROM Orders
WHERE OrderDate >= '{start_date}' AND OrderDate < '{end_date}';

DB2 — logical partitioning

SELECT '2022-01-01' AS start_date, '2022-01-31' AS end_date FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT '2022-02-01', '2022-02-28' FROM SYSIBM.SYSDUMMY1;

SELECT * FROM Orders WHERE OrderDate BETWEEN '{start_date}' AND '{end_date}';

AS400 — logical partitioning

SELECT DATE('2022-01-01') AS start_date, DATE('2022-01-31') AS end_date FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT DATE('2022-02-01'), DATE('2022-02-28') FROM SYSIBM.SYSDUMMY1;

SELECT * FROM Orders WHERE OrderDate BETWEEN '{start_date}' AND '{end_date}';

Informix — logical partitioning

SELECT 1 AS start_id, 1000 AS end_id FROM systables WHERE tabid = 1
UNION ALL
SELECT 1001, 2000 FROM systables WHERE tabid = 1;

SELECT * FROM Orders WHERE OrderID BETWEEN {start_id} AND {end_id};

Using partition parameters in source queries and transformations

Columns returned by the Partition SQL can be referenced anywhere in the source query or in the TO part of the source-to-destination transformation using {ColumnName}. For example, if Partition SQL returns StartDate and EndDate:

SELECT * FROM Orders
WHERE OrderDate >= '{StartDate}' AND OrderDate < '{EndDate}';

You can also use these tokens to name output files dynamically:

orders_{EndDate}.csv

The partition index is available as {PartitionId} — a 1-based number representing the row returned by the Partition SQL. Useful for uniquely identifying or naming partitions.

Articles in this section