Get started with Greenplum – Etlworks Support

Greenplum is a massively parallel Postgres fork used for analytics, ML, and AI workloads. Etlworks ships flow types optimized for Greenplum's gpload-based bulk-load path.

Which Greenplum flow should I use?

Flow	Use when
Any to Greenplum (Database / File / Queue / Web service / Well-known API)	You need to extract from any source, optionally transform, and load into Greenplum.
Bulk load files into Greenplum	The files already exist in server storage. No transformation needed. Auto-loads into staging and MERGEs into the destination.
Stream CDC events into Greenplum	You need real-time replication from a CDC-enabled source database.
Streaming with message queues	You need real-time ingestion from a message queue that supports streaming.

What do I need before I start?

A Greenplum cluster reachable from your Etlworks instance.
The gpload utility installed on the same VM as Etlworks — see Install and configure gpload below.
A Greenplum user with INSERT on the target tables.

Connect to Greenplum

Open the Connections window and click +.
Type greenplum in the search field.
Select the Greenplum connection and fill in the connection parameters. Full reference: configuring the Greenplum connection.

Greenplum connection gallery

Also create a server storage connection for the stage.

Install and configure gpload

Greenplum-optimized flows use the gpload utility to load files into Greenplum tables.

Install gpload

gpload must be installed on the same VM as Etlworks. If you need help with the install, contact support@etlworks.com.

Configure the gpload command

By default, Etlworks invokes gpload as:

gpload -f {CONTROL_FILE}

Override per flow under Flow → Parameters:

gpload command per-flow override

Override the gpload command for all flows

To set a single gpload command for the whole Etlworks instance, edit TOMCAT_HOME/application.properties and restart the Etlworks service:

greenplum.load.command=/opt/greenplum-db-6.23.0/load.sh -f {CONTROL_FILE}

Note: If greenplum.load.command is set, the per-flow override is ignored.

Where to go next

Topic	Article
Extract, transform, and load data into Greenplum	Extract, transform, and load data in Greenplum
Bulk-load existing files	Bulk load files into Greenplum
ELT — run transformation SQL directly in Greenplum	ELT with Greenplum
Reverse ETL — extract from Greenplum into any destination	Reverse ETL with Greenplum
Load many tables at once	Load multiple tables with a wildcard
Incremental load (HWM)	Incremental change replication using high watermark
Troubleshooting	Common issues when loading data into cloud data warehouses

Articles in this section