Continuous delivery model
Starting November 2021 we are switching to the continuous delivery model. With this model bug fixes, new features, and enhancements are released as soon as they are ready. The updates are automatically deployed to the individual Etlworks instances on the rolling schedule. Read the full announcement.
This is a required update.
We have fixed the memory leak in the Amazon S3 SDK connector. We also fixed a similar memory leak in AWS-specific connectors (Kinesis, SQS, RebitMQ, ActiveMQ) which use IAM role authentication. All instances managed by Etlworks have been upgraded. Self-hosed customers are highly advised to upgrade as soon as possible. Customers which have Integration Agents are encouraged to update the agents as well.
It is now possible to manage the Etlworks billing account and subscriptions from the Etlworks app. Read more.
It is now possible to access this changelog from the Etlworks app. Read more.
It is now possible to search for information in the Documentation and submit support requests from the Etlworks app. Read more.
It is now possible to resize all split panels (such as in Explorer, Connections, etc.) further to the right. It allows users to see long(er) filenames, connection names, and other objects.
We have added links to the running flows in the Suspend flow executions window.
We now display a warning message when a user is trying to create a non-optimized flow when the destination is Snowflake, Amazon Redshift, Synapse Analytics, Google BigQuery, or Greenplum. The warning message includes a link to the relevant article in the documentation.
We have added a new flow type that can be used to create dynamic workflows which change based on user-provided parameters. Read more.
It is now possible to split CSV files using a user-defined delimiter or regular expression. Read more.
CDC flows now never stop automatically unless they stopped manually or fail. Read more. Note that the behavior of the previously created CDC flows did not change.
We have improved the algorithm for creating the transaction markers by CDC flows. They now use the actual start/commit/rollback events emitted by the source database. Previously we were using the change of the transaction id as a trigger. It was creating a situation where the flow was waiting for a new transaction to start before creating an "end of previous transaction" event.
We have updated the HubSpot connector which now supports new authorization scopes introduced by HubSpot in August.
Various bug fixes and performance improvements under the hood.
We have added new tutorials for creating CDC pipelines for loading data into Snowflake, Amazon Redshift, Azure Synapse Analytics, Google BigQuery and Greenplum. Read more.
Our redesigned main website (https://etlworks.com) went live.
MySQL CDC connector now supports reading data from the compressed binlog. Read more.
It is now possible to disable flashback queries when configuring the Oracle CDC connection. This could greatly improve the performance of the snapshot in some environments. Read more.
CDC connectors can now be configured to capture
NOT NULL constraints. Read more.
Legacy S3 and Azure Storage connectors have been deprecated. The existing legacy connections will continue to work indefinitely but new connections can only be created using S3 SDK and Azure Storage SDK connectors.
Bulk load flows are now ignoring the empty data files.
Bulk load flows which are loading data from Azure Storage now support traversing all subfolders under the root folder.
User request. It is now possible to add record headers when configuring Kafka and Azure Events Hubs connections. Record headers are key-value pairs that give you the ability to add some metadata about the record, without adding any extra information to the record itself.
The BigQuery connector now maps the ARRAY data type in the source database (for example Postgres) to STRING in BigQuery.
Fixed bug which was causing a recoverable NullPointerException (NPE) when saving flow execution metrics.
Various bug fixes and performance improvements under the hood.
Single Sign On (SSO) is now available to all Etlworks Enterprise and On-Premise customers. Read more.
We have added a bulk load flow for loading CSV and Parquet files in Azure Storage into Azure Synapse Analytics. It provides the most efficient way of loading files into Synapse Analytics. Read more.
We have optimized loading data from MongoDB into relational databases and data warehouses such as Snowflake, Amazon Redshift, and Azure Synapse Analytics. It is now possible to preserve the nested nodes in the documents stored in MongoDB in the stringified JSON format. Read more.
Bulk load files into Snowflake now supports loading data in JSON, Parquet, and Avro files directly into the
Variant column in Snowflake. Read more.
It is now possible to connect to Snowflake using External OAuth with Azure Active Directory. Read more.
The Azure Events Hubs connector now supports compression. Read more.
The Flows Executions Dashboard now displays the aggregated number of records processed by the specific flow on the selected day. It could be useful when monitoring a number of records processed by the CDC pipeline, which typically includes 2 independent flows (each with its own record tracking capabilities): extract and load.
We have improved the Flow which creates staging tables or flat files for each dimension of the nested dataset. It is now possible to alter the staging tables on the fly to compensate for the variable number of columns in the source. We have also added the ability to add a column to each staging table/file that contains the parent node name. Read more.
It is now possible to authenticate with SAS token and Client Secret when connecting to Azure Storage using the new Azure Storage SDK connector. Note that the legacy Azure Storage connector also supports authentication with SAS token but does not support Client Secret.
We have updated the Sybase JDBC driver to the latest version.
We have fixed the soft deletes with CDC. This functionality was broken in one of the previous builds.
It is now possible to filter flows associated with the Agent by name, description, and tags.
It is now possible to configure and send email notifications from the Etlworks instance for flows executed by the Agent.
It is now possible to bulk load CSV files into the Snowflake from the server (local) storage. Previously it was only possible to bulk load files into the Snowflake from the S3, Azure Blob, or Google Cloud storage. The flow
Load files in cloud storage into Snowflake was renamed to
Bulk load files into Snowflake. Note that it was always possible to ETL files into the Snowflake from server (local) storage.
The flow Bulk load CSV files into the Snowflake now supports loading files by a wildcard pattern in COPY INTO and the ability to handle explicit CDC updates when the CDC stream includes only updated columns.
MySQL CDC connector now supports useCursorFetch property. When this property is enabled the connector is using the cursor-based result set when performing the initial snapshot. The property is disabled by default.
All CDC connectors now test the destination cloud storage connection before attempting to stream the data. If the connection is not properly configured the CDC flow stops with an error.
The Debezium has been upgraded to the latest 1.9 release.
We have added a new premium Box API connector.
Snowflake, DB2, and AS400 JDBC drivers have been updated to the latest and greatest.
We introduced two major improvements for Change Data Capture (CDC) flows. The previously available mechanism for the ad-hoc snapshots using a read/write signal table in the monitored schema has been completely rewritten.
- It is now possible to add new tables to monitor and snapshot by simply modifying the list of the included tables. Read more.
- It is now possible to trigger the ad-hoc snapshot at runtime using a table in any database (including a completely different database than a database monitored by CDC flow) or a file in any of the supported file storage systems: local, remote, and cloud.
Webhooks now support custom payload templates. The templates can be used to configure integration with many third-party systems, for example, Slack.
The S3 SDK connector now supports automatic pagination when reading files names by a wildcard.
Magento connector now supports authentication with Access token.
Etlworks Integrator now supports Randomization and Anonymization for various domains, such as names, addresses, Internet (including email), IDs, and many others.
We added a new flow type: Bulk load files in S3 into Redshift. Use the Bulk Load Flow when you need to load files in S3 directly into Redshift. This Flow is extremely fast as it does not transform the data.
The Redshift driver now automatically maps columns with a
TINYINT data types to
INTEGER. It fixes the issue when Redshift is unable to load data into the
SMALLINTcolumn if the value is larger than
CSV connector can now read the gzipped files. It works in Explorer as well.
It is now possible to override the default key used for encryption and decryption of the export files.
Users with the
operator role can now browse data and files in Explorer.
It is now possible to override the storage type, the location, the format, whether the files should be gzipped, and the CDC Key set in the CDC connection using TO-parameters in source-to-destination transformation. Read more.
We added a new S3 connector created using the latest AWS SDK. It is now a recommended connector for S3. The old S3 connector was renamed to
Legacy. We will keep the Legacy connector forever for backward compatibility reasons.
It is now possible to see and cancel actions triggered by the end-user to be executed in Integration Agent. When the user triggers any action, such as Run Flow, Stop Flow, Stop Agent the action is added to the queue. The actions in a queue are executed in order on the next communication session between the Agent and the Etlworks Integrator. Read more.
We added an SMB Share connector. Among other things, it supports connecting to the network share over the SSH tunnel.
Google Sheets connector now supports configurable timeout (the default is 3 minutes) and auto-retries when reading data.
The Flow types
Extract nested dataset and create staging files and
Extract nested dataset and create staging tables which are used to normalize nested datasets as relational data model now support message queues as a source.
Legacy MySQL CDC connector now provides information about the current and previous log readers. It is specifically useful when the connector is configured to automatically snapshot new tables added to the Include Tables list.
The Integration Agent is a zero-maintenance, easy-to-configure, fully autonomous ETL engine which runs as a background service behind the company’s firewall. It can be installed on Windows and Linux. The Remote Integration Agent is now fully integrated with the cloud Etlworks instance. You can monitor the Agent in real-time, schedule, run, stop and monitor flows running on-premise. Read how to install, configure, and monitor the new Integration Agent. Read about configuring flows to run in the Integration Agent.
We added a new flow type: Bulk load files into the database without transformation. Use the Bulk Load Flow when you need to load files in the local or cloud storage directly into the database which supports a bulk load. This Flow does not transform the data. Read how to ETL data into databases using bulk load.
It is now possible to create a new Google Sheets spreadsheet if it does not exist. Read more.
Bulk load flows now support splitting large datasets into smaller chunks and loading chunks in parallel threads. Read more about performance optimizations when loading large datasets using bulk load flows.
It is now possible to configure the CSV format to enclose the columns in the header row in double quotes. Previously only values could be enclosed.
Added new connectors for messages queues:
Fixed an error in the Snowflake bulk load flow when the schema name starts with non-SQL characters, for example
The POST/PUT listeners can be now configured to not enforce the strict UTF-8 encoding.
The flow can be now configured to fail if the source field in mapping does not actually exist in the source.
Fixed an error when the field in mapping contains trailing or leading spaces.
Added the new Flow type: bulk email reader. This flow reads the email messages (including attachments) from the inbound email connection and saves them as files into the designated folder in the server storage. Use this flow type when you need to read hundreds or thousands of emails as fast as possible from the relatively slow inbound email connection.
Added new collision policy when importing previously exported flows:
Keep all, replaces Flows and Macros. Use this policy if you are migrating flows from one environment to another and prefer to keep existing connections and formats in the destination environment.
MongoDB CDC connector now supports MongoDB change streams.
Added the ability to configure flow so it can not be executed manually. Enable it if the Flow is a part of the nested Flow and is not meant to be executed independently.
Fixed an issue causing intermediate errors when sending and receiving emails from/to servers with enabled TSL 1.1.
Added the ability to configure the type of SQL executed when merging data using Snowflake and Bulk Load flows. The default option is DELETE/INSERT. It deletes all records in the actual table that also exist in the temp table, then inserts all records from the temp table into the actual table. If this parameter is set to MERGE the flow executes native MERGE SQL.
Added binary format selector for Snowflake flows. The value of this parameter defines the encoding format for binary input or output. This option only applies when loading data into binary columns in a table. Default: HEX.
Added the ability to send a test email and set the FROM when configuring the SMTP for sending email notifications.
Added the ability to configure the number of records when sampling nested data structures. It allows the Explorer and the Mapping to more accurately set the column's data types when working with non-relational datasets, for example, JSON and XML files.
Parquet and Avro connectors now support automatic schema generation. Prior to this update, the developer would need to provide a schema in order to create Parquet or Avro document.
Added new authentication types (including interactive authentication with Azure Active Directory) for SQL Server connector.
All bulk load flows (Snowflake, Redshift, BigQuery, Greenplum, Azure Synapse, generic Bulk load) now support extra debug logging. When option
Log each executed
SQL statement is enabled the flow will log each executed SQL statement, including Before SQL, After SQL, CREATE and ALTER TABLE, COPY INTO, MERGE, DELETE, etc.
Added Max Latency Date for the CDC metrics.
Added new webhook types:
- Webhook for the event when the flow is stopped manually or by API call.
- Webhook for the event when the flow is stopped by the scheduler because it has been running for too long.
- Webhook for the event when the flow is running too long.
- Webhook for the event when the maintenance task is not able to refresh the access token for the connection configured with Interactive Azure Active Directory authentication.
- Webhook for the event when the flow generates a warning, for example, "source table has more columns than destination table".
TABLE_DB flow variables for flows which load data from multiple tables matching the wildcard name. These variables can be used in Before and After SQL, as well as Source Query.
Added the ability to create all columns as nullable when creating a database table.
Added the ability to retry failed transformations on the next run.
Added the ability to use S3 authentication with IAM role or default profile.
Added password authentication for Redis.
Added the ability to execute CDC snapshots in parallel threads.
Fixed an edge case when MySQL CDC connection configured with SSH tunnel was not closing the SSH connection when switching from snapshot reader to the binlog reader.
Added support for shared Google Drive. If this option is enabled (default) the connector supports MyDrives and Shared drives. When it is disabled it only supports MyDrives.
Added the ability to configure a database connection to open when needed.
The option to modify the Create table SQL is now available for all bulk load flows.
Added new flow type: Load files in cloud storage into Snowflake. This flow is the most efficient way of loading data into Snowflake when you already have CSV files in the cloud storage (Amazon S3, Google Cloud Storage, Azure Blob) and don't need to transform the data.