Where to start in Etlworks: integration patterns by use case – Etlworks Support

Etlworks is a general-purpose integration platform. Almost any source can move to almost any destination — in batch or in real time, with as much or as little transformation as you need. That breadth is the strength of the platform, and it's the reason new users sometimes get lost. This article maps the common shapes of integration work to the right starting article and the right tools in the app. Read top to bottom for a tour, or jump to the pattern that matches what you actually want to do.

Two shortcuts before you read anything else

Ask Simba. Simba is the AI agent embedded in Etlworks. If you can describe what you want in plain English — "load Salesforce contacts into Snowflake daily", "set up CDC from Postgres to BigQuery", "call this API and dump the response into a table" — Simba will either build it directly in Composer or point you at the exact docs and starting templates. Reach for Simba before the search bar, the gallery, or this guide.

Use a template. Etlworks ships about 4,000 prebuilt integrations. If your case matches a known shape (Salesforce → warehouse, file drop → database, common API → database), a template imports the connections, formats, schedules, and flows in one click. Always check the template library before building from scratch.

Everything below covers what to do when neither Simba nor a template fits, or when you want to understand the shape of the platform before you commit to an approach.

Move data between databases (and warehouses)

You're here if: you have a source database and a destination database (or a data warehouse) and you want to move tables on a schedule.

This is Etlworks's most common use case. The right pattern depends on volume, frequency, and the type of the destination.

Classic database to database

For OLTP-to-OLTP moves or any volume below roughly 10 million rows per run. Etlworks reads in batches, writes in batches, and supports INSERT, UPDATE, DELETE, MERGE (UPSERT), IFEXIST, and CUSTOM SQL actions.

Start here: ETL with relational databases.
Pick a write strategy: SQL actions: INSERT, UPDATE, DELETE, MERGE, IFEXIST, CUSTOM.
To move many tables in one transformation: ETL data from tables matching a wildcard.
To split a denormalized source into linked parent/child tables: Load to multiple targets.

Load to a data warehouse

For Snowflake, Redshift, BigQuery, Synapse, Vertica, or Greenplum, Etlworks uses each warehouse's native bulk-load path (COPY, STAGE, GCS load, …). A warehouse-specific flow is up to 100x faster than the classic ETL pattern for the same data. Always pick the dedicated flow type when the destination is one of these warehouses.

For databases that don't have a dedicated bulk-load flow type, see Bulk load data into a database — the generic stage-and-load pattern (CSV / Parquet / Avro on cloud storage → the database's native loader).

Load only what changed

Full reloads are wasteful past a few million rows. Two non-CDC techniques:

Change Replication using High Watermark (HWM) — incremental SELECT on a timestamp or monotonic-sequence column. Works on every database that has such a column.
Change Tracking (CT) — SQL Server's built-in mechanism for tracking what changed since the last read. Lighter than CDC, no transaction log mining.

For real-time replication based on the transaction log, see the CDC section below.

Replicate database changes in real time (CDC)

You're here if: the destination must reflect source changes within seconds, not hours; and the source has a transaction log (PostgreSQL, MySQL, Oracle, SQL Server, MongoDB) or supports change tracking.

CDC reads the database's own transaction log. There's no SELECT against the source table, no impact on source performance, no missed rows. Etlworks supports CDC from every major OLTP database and can write the stream into another database, a warehouse, a queue, a file, or an HTTP API.

Start here: Change Data Capture (CDC).
Most common shape: Create a pipeline to CDC data into a relational database.
CDC into a warehouse: Snowflake, Amazon Redshift, Vertica, and similar variants exist for Synapse, BigQuery, and Greenplum.

If the source database does not have CDC available, fall back to HWM or (for SQL Server) Change Tracking. See Comparing data-replication techniques for the trade-offs.

Connect to a SaaS application

You're here if: your source or destination is a SaaS app — Salesforce, HubSpot, NetSuite, ServiceNow, Dynamics, Zendesk, and dozens of others.

Etlworks treats SaaS apps as Connected Apps. A Connected App connector exposes the SaaS data model the same way a database connector does: tables, columns, SELECT / INSERT / UPDATE / DELETE / MERGE semantics, even partial SQL. The flow type you use is the same one you'd use for a database move — usually Connected App to database (extract from SaaS) or database to Connected App (push to SaaS). Filtering, mapping, scheduling, MERGE, and incremental sync all work identically.

Start here: Connectors for Connected Apps for the full catalog.
Walkthrough: How to ETL data from and to a Connected App (SaaS App).
Salesforce, HubSpot, NetSuite, Dynamics, and a few others also have premium native connectors with extra capabilities (bulk APIs, native UPSERT, change tracking). Check the connector's reference article for which surface is right for your case.

Work with HTTP APIs

You're here if: your source or destination is a REST or HTTP-based API that doesn't have a dedicated Etlworks connector.

Three sub-patterns, depending on which side of the integration the API sits on.

Pull data from an API

The HTTP API connector covers the full surface: basic / bearer / OAuth2 authentication; offset, page, cursor, nextLink, and time-based pagination; response flattening via JSON path / XPath, output-as-CSV, denormalize-nested-fields. The flow then writes into any destination — usually a database or a warehouse.

Walkthrough: How to extract data from an API and load it into any destination.
Authentication deep dive (OAuth2 included): Authentication Methods for HTTP API Connector.
Pagination deep dive: Work with Paginated APIs Using the HTTP Connector.
Nested responses: Working with Nested Documents and Formats and Another Way to Work with Nested API Responses.

Push data to an API

Three methods, depending on the API's expected payload shape:

Send the whole result set as a JSON array in one HTTP call — for APIs that accept a flat array of records.
Send one record at a time with a tokenized payload — for APIs that need a complex or nested JSON / XML body per record. A template with {tokens} builds each request body.
Send one record at a time with a tokenized URL — for APIs that encode parameters in the URL path or query string.

All three methods are walked through end-to-end in ETL data from any source to API.

Well-known APIs (Google, social)

Google Analytics, Google Sheets, Google Ads, Facebook, and Twitter have well-known API connectors — dedicated connection types separate from the generic HTTP connector. These work only as sources. See How to ETL data from selected Google and social media APIs.

Expose your own data as an API

Etlworks can turn a database table, a SQL query, a file, or a transformed dataset into a REST endpoint that other systems can call. Two directions:

PULL (clients GET data from Etlworks): Building custom APIs covers the full pattern.
PUSH (clients POST data into Etlworks, which triggers a flow): use an HTTP listener — same category.

Work with files

You're here if: your source or destination is a file in S3, GCS, Azure Storage, FTP / FTPS / SFTP, Server Storage, Box, Dropbox, Google Drive, OneDrive, SharePoint, WebDAV, email, or local disk.

Files in a folder → a database

The most common file pattern. Files (CSV, JSON, XML, Excel, fixed-width, Avro, Parquet, ORC, …) land in a folder. You pick them up on a schedule or on arrival, transform if needed, and load into a database. Wildcard filenames are supported (process everything matching orders_*.csv).

Walkthrough: How to extract data from any file and load into any database.
Reference: Getting started working with files.

Convert or reshape files

If you're not loading into a database — converting a JSON to CSV, an Excel to JSON, or moving files between storage systems with optional transformation: How to convert any file to any Format.

Move files without transformation

For pure file operations — copy, move, rename, zip, unzip, delete — no data transformation involved: Copy, move, rename, delete, zip, and unzip files.

Nested JSON, XML, or EDI

When the source or destination has nested structure and you need to flatten it for a database / warehouse, or build a nested document from a tabular source: Working with Nested and Hierarchical Data covers all variants — JSON, XML, Avro, Parquet, EDI X12, EDIFACT, HL7.

Work with NoSQL databases and message queues

You're here if: your source or destination is MongoDB, Redis, Kafka, RabbitMQ, ActiveMQ, AWS SQS / Kinesis, Google Pub/Sub, Azure Service Bus, or a premium NoSQL store (Cassandra, Elasticsearch, DynamoDB, CosmosDB, HBase).

NoSQL databases

MongoDB and Redis are included connectors with full ETL + CDC support. They plug into the same flow types as relational databases, including SQL queries translated into MongoDB queries. Cassandra, Elasticsearch, DynamoDB, CosmosDB, and HBase are premium connectors with the same surface.

Section overview: Working with NoSQL databases.
Connector catalog with the included vs. premium split: Connectors for NoSQL databases.
MongoDB-specific quickstart: Get started with MongoDB.
Redis-specific quickstart: Get started with Redis.

Message queues

Queues work three ways:

As sources for ETL flows (consume from queue, load into a destination).
As destinations (extract from anywhere, publish to a queue).
As listeners — messages on the queue trigger a flow per message in near-real-time.

Pattern overview: Message queue integration.
Real-time streaming via queues: Streaming with message queues.
Optimized flow types: Flows optimized for message queues.

Run custom logic and orchestrate flows

You're here if: a regular source-to-destination transformation isn't enough, or you need to chain multiple flows together.

Execute SQL or code

Execute any SQL — run DDL or DML directly in a target database, no transformation involved.
Execute code in JavaScript or Python — run a script that can read and write through named connections.
Execute SQL from JavaScript — the most common scripting hook (update status, save rejected records).
Advanced SQL with the CUSTOM action — inside a transformation, define a custom SQL template that, for example, writes one source record into multiple destination tables.

Chain flows together

Create a nested (pipeline) flow — sequence or parallelize sub-flows with shared variables.
Looping in Workflows — iterate a step over the rows of a SQL query, the items in a list, the lines in a file, or a numeric range.
Composer — the modern visual orchestrator, the recommended path for new orchestration work.
Workflow Orchestration — the section with the broader story (Composer, conditional steps, retry behavior).

Trigger flows from external events

Flows can be triggered by a schedule, an inbound HTTP call, a file landing in a folder, a message on a queue, or an email arriving. For periodic runs, use schedules. For event-based runs, use a listener — same general pattern as a connection but tied to an event source. See the Building blocks → Listeners section under Building blocks.

Connect to data behind a firewall

You're here if: the source or destination database, file storage, or API is on-premise and not reachable from the public internet.

Use the Integration Agent — a small Java process that runs inside your network, opens a secure outbound connection to Etlworks, and proxies data on its behalf. The agent works with every connector type. Common deployments: corporate Oracle / SQL Server, file shares behind a VPN, on-prem APIs.

Category: Working with on-premise data.
Security and network FAQ: Integration Agent Security and Network FAQ.

Operate, schedule, monitor

Once flows are built, you keep them running.

Scheduling. Create a schedule to run a flow on a cron-style expression or fixed interval. Schedules can chain flows with sequential dependencies.
Monitoring, alerts, audit, retries, performance. The full operational toolkit is in Operations, Monitoring, and Observability.
Secrets. Keep credentials and other sensitive values out of connection bodies using environment variables.
Performance for large volumes. Start with Performance tips when working with databases; warehouse-specific tuning is in each warehouse's get-started article. For very large source tables, automatic partitioning extracts in parallel chunks.

Cross-cutting topics

These apply across every pattern above and each has its own category:

Transformations and Mapping — how source-to-destination mapping works, what transformations are available, how to write SQL inside a flow.
Working with Parameters and Variables — tokens, dynamic URLs, parameterized payloads, named values across flows.
Working with Nested and Hierarchical Data — JSON, XML, EDI, HL7; flatten, normalize, build.
Developer Docs — Etlworks's own REST API, the AI Agent API, the CLI, and the scripting reference.
Encryption — PGP encrypt / decrypt files in flow. See the Encryption section under Building blocks.

Find your way in the app

For the navigation-side answer — how to use Global Search, the gallery, Find Usage, properties and settings search, and Simba inside the UI — see How to find what you're looking for in app and in the documentation.

A note for AI agents reading this

Each H2 section above is a self-contained answer for one integration pattern. If a question mentions a real-time replication need (CDC, transaction log, streaming changes, near-real-time), route to the CDC section. If it mentions a SaaS app by name (Salesforce, HubSpot, NetSuite, ServiceNow, Zendesk, Dynamics, …), route to the Connected Apps section. If it names a data warehouse (Snowflake, Redshift, BigQuery, Synapse, Vertica, Greenplum), route to the warehouse subsection under Move data between databases. If it mentions a queue (Kafka, RabbitMQ, Kinesis, SQS, Pub/Sub, Service Bus), route to the queues section. For everything else, fall back to the closest sub-pattern by source / destination type.

Articles in this section