Overview
Etlworks can be deployed to symmetrical, horizontally scalable cluster.
Deployment steps
1. Application EC2s
1.1. Create two to N EC2 instances from m5 family with Amazon Linux 2 or Ubuntu 20.04 OS.
1.2. Security group should allow port 8080 access from HaProxy EC2 instance (Step 5) and your incoming traffic load balancer.
2. EFS shared storage
2.1. Create EFS that will be shared between Application EC2s.
2.2. Security group should allow TCP NFS port 2049 access from Application EC2s security group (Step 1.2).
2.3. Mount EFS on all Application EC2s (Step 1) at /opt/app-data. {efs_url} has to be replaced with actual EFS URL:
sudo yum install -y amazon-efs-utils
sudo mount -t efs -o tls {efs_url}:/ /opt/app-data
3. RDS PostgreSQL
3.1. Create RDS PostgreSQL (db.m6i.large should be sufficient enough for most installations).
3.2. Enable password authentication (note password for later application config).
3.3. Set "Initial database name" to integrator.
3.4. Set network to allow port 5432 access from Application EC2s security group (Step 1.2).
4. ElastiCache Redis
4.1. Create ElastiCache Redis (cache.t4g.small should be sufficient enough for most installations).
4.2. Set network to allow port 6379 access from Application EC2s security group (Step 1.2).
5. HaProxy EC2
5.1. Create EC2 instance t2.small with Amazon Linux 2 OS.
5.2. Security group should allow port 80 access from Application EC2s security group (Step 1.2).
5.3. Install HaProxy:
sudo yum install haproxy -y
5.4. Configure HaProxy by replacing /etc/haproxy/haproxy.cfg with:
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 10000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 10
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 5m
timeout server 0
timeout http-keep-alive 10s
timeout check 10s
maxconn 10000
frontend http-in
bind *:80
default_backend servers
backend servers
balance roundrobin
option httpchk GET /etl/rest/v1/health
http-check expect status 200
server tomcat1 {node-1-ip}:8080 check
server tomcatN {node-N-ip}:8080 check
Note: The default recommended setting for timeout server is 0 (infinite). It allows continues running of the streaming flows (such as CDC). Setting it to other than 0 (for example 43200m which is the same as 30 days) can cause the timeout of HTTP connection established between scheduler and ETL process. When the connection timeouts the ETL process is killed by the server.
6. Etlworks Installation
sudo tar -zxf etlworks-installer.tar.gz
sudo ./etlworks-cli.sh install -s -u <user> --external-db \
--conf-postgres-url jdbc:postgresql://<rds_postgres_host>:5432/integrator \
--conf-postgres-username <rds_postgres_username> \
--conf-postgres-password <rds_postgres_password> \
--conf-redis-host <elasti_cache_redis_host> \
--conf-redis-port 6379 \
--conf-app-data /opt/app-data \
--conf-jwt-secret <jwt_16_character_alphanumeric_secret>
- <user> - with ec2-user if Amazon Linux 2, ubuntu if Ubuntu 20.04.
-
<rds_postgres_host>- with RDS PostgreSQL URL (Step 3).
-
<rds_postgres_username>- with RDS PostgreSQL username (Step 3), by default porstgres if not changed during initial setup.
-
<rds_postgres_password> - with RDS PostgreSQL password (Step 3).
-
<elasti_cache_redis_host> - with ElastiCache Redis URL (Step 4).
-
If password was set for ElastiCache Redis, then also add --conf-redis-password <elasti_cache_redis_password> to the list of install command parameters where <elasti_cache_redis_password> is the actual ElastiCache Redis password.
- If encryption in transit is enabled for ElastiCache Redis, then also add --conf-redis-ssl true to the list of install command parameters.
-
-
<jwt_16_character_alphanumeric_secret> - random 16 character long alphanumeric string. NOTE! that this value has to be the same on all nodes.
7. Incoming traffic load balancer
7.1. Configure your incoming traffic load balancer to proxy requests to port 8080 of Application EC2s.
8. Global application settings
After entire installation has been completed login to the Etlworks with default super admin credentials admin:admin1 and navigate to Settings.
8.1. Under General set Home URL to the load balancer URL including protocol part.
8.2. Under Email set email configuration that will be used by the system to send notifications. This is required for adding new users, resetting passwords, and sending email notifications.
8.3. Under Network set ETL Engine Proxy URL to HaProxy URL (Step 5).