When to use a database connection
- In a source-to-destination transformation that reads data from or writes data into the database
- In the flows optimized for Snowflake, Redshift, and Azure Synapse Analytics
- In the flows used to bulk load data into databases
- To execute any SQL
- In change-data-capture (CDC) flows
- In the flows that use any of the premium connectors, such as HubSpot, Marketo, etc.
Read how to work with databases in Etlworks.
Prerequisites
Etlworks Integrator must be able to connect to the database. If the database is behind the firewall and not exposed to the Internet there are several options:
- Use SSH tunnel
- Whitelist the IP address of the Etlworks Integrator instance. The IP address of the host
app.etlworks.com
is18.188.180.116
. - Use data integration agent running behind the firewall.
Available connectors
The following connectors are available out-of-the-box:
- Oracle
- MS SQL Server
- IBM DB2
- IBM AS 400
- Informix
- PostgreSQL
- MySQL
- MariaDB
- Sybase ASE
- Snowflake
- Azure Synapse Analytics
- Amazon RDS
- Amazon Aurora PostgreSQL
- Amazon Aurora MySQL
- Amazon Redshift
- Amazon Athena
- Microsoft Access
- Teradata
- Google BigQuery
- Apache Hive
- Derby (JavaDB)
- SQLite
- Neo4j
- Elasticsearch - free and premium
- SAP HANA
- DynamoD - premium
- HBase - premium
- CosmosDB - premium
- Cassandra - premium
- Btrieve - premium
- Couchbase - premium
Creating a database connection
To create a database connection, open the Connections window, click the +
button, and select Database in the left navigation bar. Select the database from the gallery. Enter the database connection parameters.
The URL
- Driver (optional) - the JDBC driver class name. When one of the predefined connection types (for example, Oracle) is selected, the driver is preset.
- URL (optional) - the connection URL. You will need replace tokens, like
<host>
,<port>
,<database>
with the actual host name, port and database. Some tokens include default values.-
before: jdbc:postgresql://<host>:<port=5432>/<database> after: jdbc:postgresql://localhost:5432/mydb
-
If needed, you can use the tokens {user}
and {password}
to pass non-standard credentials for the username and password. For example, to connect to the Google BigQuery you will need to provide a service account email and path to a secret key file. Example: OAuthServiceAcctEmail={user};OAuthPvtKeyPath={password}
.
The URL can include advanced options, such as enabling SSL, etc. Please visit the database vendor web site for more information.
Alternatively to URL
- Host - the database host or IP address
- Port - the port
- Database - the database name
Authentication
- User - username.
- Password - password.
Other parameters
- Auto Commit - connections with Auto Commit set to Yes commit the transaction on each executed SQL statement. Set this flag to
Yes
if you expect a very high volume of transactions or are planning to execute DDL statements.
- Run at Connect - the SQL statements to run each time a connection to the database is established.
-
You can run multiple
;
separated SQL statements. - Other Parameters - enter optional parameters as key=value pairs. Refer to the database vendor documentation for more information.
Handling Metadata
- Always enclose table/column names in double quotes - if this option is enabled, the system will always enclose table and column names in double quotes. Otherwise, only names which contain spaces or start with numbers will be enclosed. Read how to handle table and column names with spaces and special characters.
- Varchar Field Size Multiplier - the value of this field, if greater than 1, will be used to multiple the default size of the VARCHAR-like columns (except CHAR) when automatically creating the destination table.
- Current Database and Schema only - If this option is enabled the system will populate tables and views for the current database and schema only. For databases with a lot of tables and views, it can greatly reduce the time needed to show the objects in Explorer and when configuring the mapping.
Fetch Size, Max Records, Max Field Size
Our ETL engine is written in Java and uses JDBC protocol to connect to the databases. Parameters below can be used to fine-tune the performance of the database driver.
- Fetch Size
In JDBC, the setFetchSize(int) method is very important to performance and memory-management within the JVM as it controls the number of network calls from the JVM to the database and correspondingly the amount of RAM used for ResultSet processing. Many JDBC drivers (for example Oracle) by default enforce the fetch size while some others (for example MySQL and Redshift) - do not. The Fetch Size field can be used to set the maximum number of records which will be retrieved in one database call, thus limiting the memory consumption within the JVM.
You can override the Fetch Size set at the connection level by setting it at the transformation level.
- Max Records
By default, our ETL engine extracts all the records from the source, limited only by the source query or filter conditions. You can use this field to set the upper limit on the total number of records which can be extracted from the source.
You can override the Max Records set at the connection level by setting it at the transformation level.
- Max Field Size
This parameter sets the limit for the maximum number of bytes that can be returned for character and binary column values in a ResultSet object produced by this Statement object. This limit applies only to BINARY, VARBINARY, LONGVARBINARY, CHAR, VARCHAR, NCHAR, NVARCHAR, LONGNVARCHAR and LONGVARCHAR fields. If the limit is exceeded, the excess data is silently discarded. For maximum portability, use values greater than 256.
Connecting to a database over SSH tunnel
A database that sits behind a firewall cannot be accessed directly from a client on the other side of the firewall, but it can often be accessed through an SSH tunnel. The firewall must be configured to accept SSH connections and you also need to have an account on the SSH host for this to work.
Use an actual database hostname and port when configuring the URL for the database connection. Etlworks Integrator will automatically replace them with the localhost and the available port.
If you need to access a database that can only be accessed via an SSH tunnel, you need to specify additional information in the "Connect over SSH tunnel" section of the database connection screen.
- SSH Host - the name or IP address for the host accepting SSH connections.
- SSH Port - the port accepting SSH connections. The default value is 22.
- SSH User - the user name
- SSH Password - the optional password
- Private Key File - the private key file in the
pem
orppk
format used for SSH authentication. Click the "Upload/Select ssh key" button to manage SSH keys using GUI. You can also upload the private key file manually and use the token{app.data}
as a part of the filename, for example,{app.data}/keys/secret.pem
. This parameter is optional.
- SSH Passphrase - the optional passphrase (password) used together with a private key file.
Connecting to Oracle
To create a connection to an Oracle database, open the Connections window, click the +
button, and type in oracle
. Choose a connection type from the list of available Oracle connections.
Enter the connection parameters.
The following link details the properties that are available for an Oracle connection's URL.
Connecting to MS SQL Server
To create a connection to a Microsoft SQL Server database, open the Connections window, click the +
button, and type in sql server
. Select Microsoft SQL Server.
Enter the connection parameters.
The following link details the properties that are available for a Microsoft Sql Server connection's URL.
It is also possible to connect to the MS Server server using popular JDTS JDBC driver.
To create a connection to a Microsoft SQL Server database, open the Connections window, click the +
button, and type in sql server
. Select Microsoft SQL Server using JTDS.
Connecting to SQL Server With Windows Authentication From A Linux Machine
Step 1. Create a new connection to the SQL Server suing JDTS driver, as explained above.
Step 2. Enter URL as following: jdbc:jtds:sqlserver://host:1433/database;domain=domain_name;useNTLMv2=true;
jdbc:jtds:sqlserver://10.0.2.2:1433/Dev;domain=MYDOMAIN;useNTLMv2=true;
Connecting to IBM DB2
To create a connection to an IBM DB2 database, open the Connections window, click the +
button, and type in db2
. Select IBM DB2.
Enter the connection parameters.
The following link details the properties that are available for an IBM DB2 connection's URL.
Connecting to IBM AS400
To create a connection to an IBM AS400 database, open the Connections window, click the +
button, and type in as400
. Select IBM AS400.
Enter the connection parameters.
The following link details the properties that are available for an IBM AS400 connection's URL.
Connecting to PostgreSQL
To create a connection to a PostgreSQL database, open the Connections window, click the +
button, and type in postres
. Select PostgreSQL.
Enter the connection parameters.
The following link details the properties that are available for a PostgreSQL connection's URL.
Secure PostgreSQL Connections with SSL
To connect to the PostgreSQL database with SSL, you will need a server certificate, a client certificate and a private key for the client certificate. In most cases they will be provided as pem
files.
-
server-ca.pem - server certificate
-
client-cert.pem - client certificate
-
client-key.pem - private key
The server and the client certificates can be used as is (as pem
files) but the private key must be PKCS8 and stored in DER format.
You can convert client-key.pem to the PKCS8 with openssl:
openssl pkcs8 -topk8 -inform PEM -outform DER -in client-key.pem -out postgresql.pk8 -nocrypt
Once you have the certificates and the private key you can upload them into the Server Storage.
It is recommended to create a folder under the Server Storage, for example postgres-certificates
, and upload files into this folder.
Once files have been uploaded you can modify the JDBC URL for the PostgreSQL connection as below:
jdbc:postgresql://host:5432/database?ssl=true&sslfactory=org.postgresql.ssl.jdbc4.LibPQFactory&sslmode=require&sslrootcert={app.data}postgres-certificates/server-ca.pem&sslcert={app.data}postgres-certificates/client-cert.pem&sslkey={app.data}postgres-certificates/postgresql.pk8
Host, port, database name, names of the certificates and the private key file, as well as the name of the folder with the certificates must be replaced on actual.
Connecting to MySQL
To create a connection to a MySQL database, open the Connections window, click the +
button, and type in mysql
. Select MySQL.
Enter the connection parameters.
The following link details the properties that are available for a MySQL connection's URL.
Secure MySQL Connections with SSL
To connect to the MySQL database with SSL you will need a server certificate, a client certificate and a private key for the client certificate. In most cases they will be provided as pem
files.
-
server-ca.pem - server certificate
-
client-cert.pem - client certificate
-
client-key.pem - private key
Step 1. Import the certificate authority (server) certificate into a Java keystore (JDK is required to use keytool).
keytool -import -file server-ca.pem -keystore truststore -storepass {password}
Step 2. Create a PKCS12 file from an existing signed client certificate and its private key with openssl.
penssl pkcs12 -export -in client-cert.pem -inkey client-key.pem -out client.p12 -CAfile server-ca.pem -password pass:{password}
Step 3. Import the combined certificate and private key into the keystore.
keytool -importkeystore -deststorepass {password} -destkeystore keystore -srckeystore client.p12 -srcstoretype PKCS12 -srcstorepass {password}
Once you have the truststore and keystore files, you can upload them into the Server Storage.
It is recommended to create a folder under the Server Storage, for example mysql-certificates
, and upload files into this folder.
Once files have been uploaded, you can modify the JDBC URL for the MySQL connection as below:
jdbc:mysql://host:3306/database?useSSL=true&requireSSL=true&verifyServerCertificate=true&clientCertificateKeyStorePassword=password&clientCertificateKeyStoreType=JKS&clientCertificateKeyStoreUrl=file://{app.data}mysql-certificates/keystore.jks&trustCertificateKeyStorePassword=password&trustCertificateKeyStoreType=JKS&trustCertificateKeyStoreUrl=file://{app.data}mysql-certificates/truststore.jks
Host, port, database name, names of the trustsore and keystore files, the password, as well as the name of the folder with the certificates must be replaced on actual.
Connecting to MariaDB
To create a connection to a MariaDB database, open the Connections window, click the +
button, and type in mariadb
. Select MariaDB.
Enter the connection parameters.
The following link details the properties that are available for a MariaDB connection's URL.
Connecting to Informix
To create a connection to an Informix database, open the Connections window, click the +
button, and type in informix
. Select one of the available Informix connection types.
Enter the connection parameters.
If Informix is using the DB2 driver, the connection type is selected and the following properties are available for an Informix connection's URL.
Otherwise, the following link details the URL properties that are available.
Connecting to Sybase ASE
To create a connection to a Sybase ASE database, open the Connections window, click the +
button, and type in sybase
. Select Sybase ASE.
Enter the connection parameters.
The following link details the properties that are available for a Sybase ASE connection's URL.
Connecting to Snowflake
To create a connection to a Snowflake database, open the Connections window, click the +
button, and type in snowflake
. Select Snowflake.
Enter the connection parameters.
Optionally enter Stage name. Stage refers to the location where your data files are stored for loading into Snowflake. A stage name can also be configured at the flow level.
The following link details the properties that are available for a Snowflake connection's URL.
Read how to work with Snowflake in Etlworks Integrator.
Connecting to Azure Synapse Analytics
To create a connection to an Azure Synapse Analytics database, open the Connections window, click the +
button, and type in synapse
. Select Azure Synapse Analytics.
Enter the connection parameters.
The following link details the properties that are available for a connection's URL.
Read how to work with Azure Synapse Analytics in Etlworks Integrator.
Connecting to Amazon RDS
Amazon RDS is a fully-managed database service. The following RDS connectors are available in Etlworks Integrator:
Connecting to Amazon Aurora PostgreSQL
Amazon Aurora PostgreSQL is essentially a fully-managed PostgreSQL cluster that is available on AWS. Use the following guidelines to configure an Aurora PostgreSQL cluster.
To create a connection to an Amazon Aurora PostgreSQL database, open the Connections window, click the +
button, and type in aurora
. Select Amazon Aurora PostgreSQL.
Enter the connection parameters.
The following link details the properties that are available for an Amazon Aurora PostgreSQL connection's URL.
Connecting to Amazon Aurora MySQL
Amazon Aurora MySQL is essentially a fully-managed MySQL cluster that is available on AWS. Use the following guidelines to configure an Aurora MySQL cluster.
To create a connection to an Amazon Aurora MySQL database, open the Connections window, click the +
button, and type in aurora
. Select Amazon Aurora MySQL.
Enter the connection parameters.
The following link details the properties that are available for an Amazon Aurora MySQL connection's URL.
Connecting to Amazon Redshift
Amazon Redshift is a fully-managed data warehouse that is available on AWS.
To create a connection to an Amazon Redshift database, open the Connections window, click the +
button, and type in redhsift
. Select Amazon Redshift.
Enter the connection parameters.
The following link details the properties that are available for an Amazon Redshift connection's URL.
Read how to work with Amazon Redshift in Etlworks Integrator.
Connecting to Amazon Athena
Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.
Read about Amazon Athena.
To create a connection to an Amazon Athena database, open the Connections window, click the +
button, and type in athena
. Select Amazon Athena.
Enter the connection parameters.
The following link details the properties that are available for an Amazon Athena connection's URL.
Important
To ensure that you can connect to the Amazon Athena from Etlworks Integrator, add the athena:GetQueryResultsStream
policy action to the list of policies for Athena. For an example policy, see AWSQuicksightAthenaAccess Managed Policy. Additionally, ensure that port 444 is open to outbound traffic.
Read how to create objects and query data from Amazon Athena.
Connecting to Microsoft Access
To create a connection to a Microsoft Access database, open the Connections window, click the +
button, and type in teradata
. Select Microsoft Access.
Enter the connection parameters.
The following link details the properties that are available for a Microsoft Access connection's URL.
Connecting to Teradata
To create a connection to a Teradata database, open the Connections window, click the +
button, and type in teradata
. Select Teradata.
Enter the connection parameters.
The following link details the properties that are available for a Teradata connection's URL.
Connecting to Apache Hive
The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. A JDBC driver is provided to connect users to Hive.
Enter the connection parameters.
The following link details the properties that are available for an Apache Hive connection's URL.
Connecting to Google BigQuery
Google BigQuery is not a typical relational database (it is actually a service), but Google has partnered with third-party companies to provide a JDBC driver that leverages the power of BigQuery's Standard SQL.
To create a connection to Google BigQuery, open the Connections window, click the +
button, and type in bigquery
. Select Google BigQuery.
Enter the connection parameters.
The following link details the properties that are available for a Google BigQuery connection's URL.
At this time a Google service account is required to connect to BigQuery.
This guide explains how to create a new Google service account.
When creating a new service account, two parameters are important:
-
The secret key associated with the account. The key can be downloaded in the JSON or P12 format.
Use the following URL to connect to BigQuery using a Google service account.
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<Project>;OAuthType=0;
OAuthServiceAcctEmail={user};OAuthPvtKeyPath={password}
Enter a Google service account email into the User field.
Upload the key, which is in the JSON or P12 format, into server storage.
Enter the path to the key file into the Password field. Use the token {app.data} to define the path to the key. For example: {app.data}/key.json
.
Connecting to Derby
Derby (Java DB) is a compact and embeddable database implemented entirely in Java and available under the Apache License, Version 2.0.
To create a connection to a Derby database, open the Connections window, click the +
button, and type in derby
. Select Client Java DB (Derby).
Enter the connection parameters.
The following link details the properties that are available for a Derby connection's URL.
Connecting to SQLite
SQLite is available in Etlworks Integrator as a temporary staging database.
In a dedicated instance, it is also possible to create a connection to the actual, not temporary (in memory), SQLite database.
Step 1. Open the Connections window, click the +
button, and type in sqlite
. Select SQLite.
Step 2. Enter the connection URL as described here.
Step 3. If needed, enter a user name and password.
Connecting to Neo4j
Neo4j is a graph database management system developed by Neo4j, Inc. Described by its developers as an ACID-compliant transactional database with native graph storage and processing, is available in a GPL3-licensed open-source "community edition", with online backup and high availability extensions licensed under a closed-source commercial license. Neo also licenses Neo4j with these extensions under closed-source commercial terms.
To create a connection to a Neo4j database, open the Connections window, click the +
button, and type in neo4j. Select Neo4j.
Enter the connection parameters.
The following link details the properties that are available for a Neo4j connection's URL.
Connecting to Elasticsearch
Elasticsearch provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
To create a connection to an Elasticsearch, open the Connections window, click the +
button, and type in elasticsearch
. Select one of the available Elasticsearch connectors:
Elasticsearch Legacy
This is a connector for older versions of the Elasticsearch (up to 6.x).
The following link details the properties that are available for an Elasticsearch Legacy connection's URL.
Read how to configure the connection.
Read how to insert and query data from Elasticsearch.
Elasticsearch SQL
This is a connector from the creators of Elasticsearch.
The Elasticseach SQL connector requires a platinum subscription to the Elastic Stack.
Read about available URL parameters and configuration properties.
This connector only supports SQL commands available as a part of the Elasticsearch SQL API: SELECT, DESCRIBE TABLE, SHOW COLUMNS, SHOW TABLES, SHOW FUNCTIONS.
Elasticsearch Premium
Etlworks Elasticsearch premium connector allows fast, real-time access to Elasticsearch data.
Etlworks partnered with CData to provide access to the Elasticseach API using industry-standard JDBC protocol. Read about the CData Elasticsearch JDBC connector.
Contact support@etlworks.com to enable a connector for your account.
Read about the connection URL.
Read about available additional parameters.
Read about SQL compliance
Connecting to SAP HANA
SAP HANA is a high-performance in-memory database that speeds data-driven, real-time decisions and actions. As the data foundation of SAP’s Business Technology Platform, it provides advanced analytics on multimodel data, on premise and in the cloud.
To create a connection to a SAP HANA database, open the Connections window, click the +
button, and type in hana. Select SAP HANA.
Read about the connection parameters.
Connecting to DynamoDB
Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It's a fully managed, multiregion, multimaster, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second.
Etlworks DynamoDB premium connector allows fast, real-time access to DynamoDB data.
Etlworks partnered with CData to provide access to DynamoDB using industry-standard JDBC protocol. Read about the CData DynamoDB JDBC connector.
To create a connection to a DynamoDB database, open the Connections window, click the +
button, and type in dynamodb. Select DynamoDB.
Read about the connection URL.
Read about available additional parameters.
Read about SQL compliance
Connecting to HBase
Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
Etlworks HBase premium connector allows fast, real-time access to HBase data.
Etlworks partnered with CData to provide access to HBase using industry-standard JDBC protocol. Read about the CData HBase JDBC connector.
To create a connection to a HBase database, open the Connections window, click the +
button, and type in hbase. Select HBase.
Read about the connection URL.
Read about available additional parameters.
Read about SQL compliance
Connecting to Cosmos DB
Azure Cosmos DB is a fully managed NoSQL database service for modern app development. Get guaranteed single-digit millisecond response times and 99.999-percent availability, backed by SLAs, automatic and instant scalability, and open-source APIs for MongoDB and Cassandra. Enjoy fast writes and reads anywhere in the world with turnkey multi-master global distribution.
Etlworks Cosmos DB premium connector allows fast, real-time access to Cosmos DB data.
Etlworks partnered with CData to provide access to Cosmos DB using industry-standard JDBC protocol. Read about the CData Cosmos DB JDBC connector.
To create a connection to a Cosmos DB database, open the Connections window, click the +
button, and type in cosmos. Select Cosmos DB.
Read about the connection URL.
Read about available additional parameters.
Read about SQL compliance
Connecting to Cassandra
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
Etlworks Cassandra premium connector allows fast, real-time access to Cassandra data.
Etlworks partnered with CData to provide access to Cassandra using industry-standard JDBC protocol. Read about the CData Cassandra DB JDBC connector.
To create a connection to a Cassandra database, open the Connections window, click the +
button, and type in cassandra. Select Cassandra.
Read about the connection URL.
Read about available additional parameters.
Read about SQL compliance
Connecting to Btrieve
Btrieve is a transactional database (navigational database) software product. It is based on Indexed Sequential Access Method (ISAM), which is a way of storing data for fast retrieval.
Etlworks Btrieve premium connector allows fast, real-time access to Btrieve data.
Etlworks partnered with CData to provide access to Btrieve using industry-standard JDBC protocol. Read about the CData Btrieve DB JDBC connector.
To create a connection to a Btrieve database, open the Connections window, click the +
button, and type in btrieve. Select Btrieve.
Read about the connection URL.
Read about available additional parameters.
Connecting to Couchbase
Couchbase is an award-winning distributed NoSQL cloud database. It delivers unmatched versatility, performance, scalability, and financial value across cloud, on-premises, hybrid, distributed cloud, and edge computing deployments.
Etlworks Couchbase premium connector allows fast, real-time access to Couchbase data.
Etlworks partnered with CData to provide access to Couchbase using industry-standard JDBC protocol. Read about the CData Couchbase DB JDBC connector.
To create a connection to a Couchbase database, open the Connections window, click the +
button, and type in couchbase. Select Couchbase.
Read about the connection URL.
Read about available additional parameters.
Read about SQL compliance
Testing a database connection
Click the Test Connection
button on the connection screen to test the database connection.
Connecting to a database if it is not on the list
If your database is not on the list of supported databases, you can still connect to it:
Step 1 Contact Etlworks Support, so we can add a JDBC driver for your database. Or, if you are running Integrator on your own premises, simply drop the JDBC driver into the application server lib folder and restart the server.
Step 2 Select the Generic JDBC connection type.
Step 3 Specify the connection parameters.
Comments
0 comments
Please sign in to leave a comment.