Overview
Greenplum is an open-source massively parallel data platform for analytics, machine learning, and AI. The Etlworks includes several pre-built Flows optimized for Greenplum.
Flows optimized for Greenplum
Flow type | When to use |
|
When you need to extract data from any source, transform it and load it into Greenplum. |
Bulk load files into Greenplum | When you need to bulk-load files that already exist in server storage without applying any transformations. The flow automatically loads data into staging tables and MERGEs data into the destination. |
Stream CDC events into Greenplum | When you need to stream updates from the database which supports Change Data Capture (CDC) into Greenplum in real-time. |
Stream messages from a queue into Greenplum | When you need to stream messages from the message queue which supports streaming into Greenplum in real time. |
Related resources
ELT with Greenplum Etlworks supports executing complex ELT scripts directly in the Greenplum database which greatly improves the performance and reliability of the data ingestion. |
Reverse ETL with Greenplum You can use any |
Load multiple tables by a wildcard name You can ETL data from multiple database objects (tables and views) into Greenplum by a wildcard name, without creating individual source-to-destination transformations.
|
Setup incremental replication using high watermark (HWM) Using HWM replication you can load only new and updated records into Greenplum.
|
Connect to the Greenplum database
Here's how you can connect to the Greenplum database:
Step 1. Open the Connections
window and click +
.
Step 2. Type greenplum
into the search field.
Step 3. Select Greenplum
Connection and continue by defining the Connection parameters.
Read more about configuring Greenplum Connection.
Install and configure gpload
Etlwors flows optimized for loading data into Greenplum use Greenplum gpload utility to load data into the Greenplum tables.
Install gpload
The gpload must be installed on the same VM as Etlworks. Contact Etlworks support atsupport@etlworks.com
if you need assistance installing the gpload.
Configure gpload command
By default, Etlworks flows use the following command to execute the gpload utility:
gpload -f {CONTROL_FILE}
You can override the gpload command
under the Flow->Parameters tab:
Override gpload command for all flows
It is possible to override the gpload command for all flows by modifying the TOMCAT_HOME/application.properties file and restarting Etlworks service.
Here is an example:
greenplum.load.command=/opt/greenplum-db-6.23.0/load.sh -f {CONTROL_FILE}
If the property greenplum.load.command
is set, the manual per-flow configuration is ignored.
Comments
0 comments
Please sign in to leave a comment.