Once data is loaded into Databricks, the heavy transformation work usually runs inside Databricks itself — Spark SQL is fast, Delta tables support efficient UPDATEs and MERGEs, and the data never leaves the lakehouse. Etlworks supports two ELT patterns: Before / After SQL on a regular load flow, and a dedicated SQL flow.
ELT using Before and After SQL
- Create a flow to load data into Databricks. Let Etlworks auto-create the staging Delta table.
-
On the transformation, use Before COPY SQL and After COPY SQL to run Spark SQL in the destination Databricks workspace. Multiple ;-separated statements are supported.
Typical post-load steps:
OPTIMIZE analytics.gold.orders ZORDER BY (customer_id); VACUUM analytics.gold.orders RETAIN 168 HOURS; ANALYZE TABLE analytics.gold.orders COMPUTE STATISTICS FOR ALL COLUMNS; - Optionally configure the flow to ignore errors when running Before / After SQL.
- Schedule the flow to run periodically.
ELT using an SQL flow
For multi-step transformations that don't belong to a single load flow, build a dedicated SQL flow targeting the Databricks connection. Multiple ;-separated statements are supported. Combine the load flow and the SQL flow into a single nested flow.
This pattern works well for medallion architectures (bronze → silver → gold) where each stage is a separate flow chained in order.