Architecture and Deployment – Etlworks Support

Overview

Etlworks is a modern, vertically and horizontally scalable web application that can be deployed to a single node (any VM, for example, EC2 instance or physical box) or to multiple nodes in a cluster behind a load balancer. The infrastructure can be configured to have a fixed number of nodes or to scale up and down depending on the load.

Deployment

We are very flexible in providing a choice to our customers to run Etlworks in any platform and operating system, cloud and on-premise.

The following deployment options are available:

Shared cloud environment, owned, managed, and operated by Etlworks. Our shared environment operates in us-east-1 (Ohio) AWS region.
Dedicated cloud instances owned, managed, and operated by Etlworks. Dedicated instances are available for customers on Enterprise plans. We support all major cloud providers: AWS, Azure, Google Cloud, Oracle Cloud, and IBM Cloud, as well as all available regions, including GovCloud.
Dedicated cloud instances which are owned, managed, and operated by the customer but provisioned and upgraded by Etlworks. We push updates from our centralized build server. Etlworks must have SSH access to the instance.
Dedicated cloud or on-premise instances which are owned, managed, and operated by the customer when Etlworks has no access to the instance. Etlworks provides a fully automated installer for Ubuntu, Amazon Linux, CentOS, Red Hat, Windows Server, all editions of Windows 10 and Windows 11, and Docker. The same installer can be used to automatically update the instance to the latest version of Etlworks.

Architecture and components

Etlworks includes two macroservices deployed to Tomcat container: Engine and App. The connectors are deployed separately but they are part of the Engine. The end-users and systems can access Etlworks via the front-end and open API.

Single node and multi-node setups

Etlworks can be deployed to the single VM or multi-node symmetrical cluster. Each node in a cluster is a VM or physical box.

In a multi-node setup, the ETL jobs are distributed between all active nodes in a symmetrical cluster. The load balancer chooses a node to run the job in by using one of the configurable load-balancing algorithms. The default is round-robin.

The scheduler always runs on a single node and automatically migrates to the next available node.

All nodes in a cluster must share the same server storage (for example, EFS on AWS), Redis (for example, Elasticache Redis on AWS), and Postgres (for example, RDS Postgres on AWS).

AWS multi-node deployment diagram

AWS multi-node setup details can be found here.

Supported topologies

Cloud, with optional SSH tunnel

Read more about using SSH tunnel to connect to the on-premise database.

Hybrid-cloud with data integration agents

Read more about data integration agents.

On-premise

Articles in this section