Greenplum is a massively parallel Postgres fork used for analytics, ML, and AI workloads. Etlworks ships flow types optimized for Greenplum's gpload-based bulk-load path.
Which Greenplum flow should I use?
| Flow | Use when |
|---|---|
| Any to Greenplum (Database / File / Queue / Web service / Well-known API) | You need to extract from any source, optionally transform, and load into Greenplum. |
| Bulk load files into Greenplum | The files already exist in server storage. No transformation needed. Auto-loads into staging and MERGEs into the destination. |
| Stream CDC events into Greenplum | You need real-time replication from a CDC-enabled source database. |
| Streaming with message queues | You need real-time ingestion from a message queue that supports streaming. |
What do I need before I start?
- A Greenplum cluster reachable from your Etlworks instance.
- The gpload utility installed on the same VM as Etlworks — see Install and configure gpload below.
- A Greenplum user with INSERT on the target tables.
Connect to Greenplum
- Open the Connections window and click +.
- Type greenplum in the search field.
- Select the Greenplum connection and fill in the connection parameters. Full reference: configuring the Greenplum connection.
Also create a server storage connection for the stage.
Install and configure gpload
Greenplum-optimized flows use the gpload utility to load files into Greenplum tables.
Install gpload
gpload must be installed on the same VM as Etlworks. If you need help with the install, contact support@etlworks.com.
Configure the gpload command
By default, Etlworks invokes gpload as:
gpload -f {CONTROL_FILE}
Override per flow under Flow → Parameters:
Override the gpload command for all flows
To set a single gpload command for the whole Etlworks instance, edit TOMCAT_HOME/application.properties and restart the Etlworks service:
greenplum.load.command=/opt/greenplum-db-6.23.0/load.sh -f {CONTROL_FILE}
Note: If greenplum.load.command is set, the per-flow override is ignored.
Where to go next
| Topic | Article |
|---|---|
| Extract, transform, and load data into Greenplum | Extract, transform, and load data in Greenplum |
| Bulk-load existing files | Bulk load files into Greenplum |
| ELT — run transformation SQL directly in Greenplum | ELT with Greenplum |
| Reverse ETL — extract from Greenplum into any destination | Reverse ETL with Greenplum |
| Load many tables at once | Load multiple tables with a wildcard |
| Incremental load (HWM) | Incremental change replication using high watermark |
| Troubleshooting | Common issues when loading data into cloud data warehouses |