Continuous delivery model
Starting November 2021 we are switching to the continuous delivery model. With this model bug fixes, new features, and enhancements are released as soon as they are ready. The updates are automatically deployed to the individual Etlworks instances on the rolling schedule. Read the full announcement.
Etlworks Integrator changelog
Single Sign On (SSO) is now available to all Etlworks Enterprise and On-Premise customers. Read more.
We have added a bulk load flow for loading CSV and Parquet files in Azure Storage into Azure Synapse Analytics. It provides the most efficient way of loading files into Synapse Analytics. Read more.
We have optimized loading data from MongoDB into relational databases and data warehouses such as Snowflake, Amazon Redshift, and Azure Synapse Analytics. It is now possible to preserve the nested nodes in the documents stored in MongoDB in the stringified JSON format. Read more.
The Flow Bulk load files into Snowflake
now supports loading data in JSON, Parquet, and Avro files directly into the Variant
column in Snowflake. Read more.
The Override CREATE TABLE using JavaScript now supports ALTER TABLE as well. Read more.
It is now possible to connect to Snowflake using External OAuth with Azure Active Directory. Read more.
The Azure Events Hubs connector now supports compression. Read more.
The Flows Executions Dashboard now displays the aggregated number of records processed by the specific flow on the selected day. It could be useful when monitoring a number of records processed by the CDC pipeline, which typically includes 2 independent flows (each with its own record tracking capabilities): extract and load.
We have improved the Flow which creates staging tables or flat files for each dimension of the nested dataset. It is now possible to alter the staging tables on the fly to compensate for the variable number of columns in the source. We have also added the ability to add a column to each staging table/file that contains the parent node name. Read more.
It is now possible to authenticate with SAS token and Client Secret when connecting to Azure Storage using the new Azure Storage SDK connector. Note that the legacy Azure Storage connector also supports authentication with SAS token but does not support Client Secret.
We have updated the Sybase JDBC driver to the latest version.
It is now possible to use global variables when configuring parameters for split file flows.
We have fixed the soft deletes with CDC. This functionality was broken in one of the previous builds.
User request. It is now possible to override the default Create Table SQL generated by the flow.
User request. The Flow Executions dashboard under the Account dashboard now includes stats for flows executed by the Integration Agent.
User request. It is now possible to use global and flow variables in the native SQL used to calculate the field's value in the mapping.
It is now possible to filter flows associated with the Agent by name, description, and tags.
It is now possible to configure and send email notifications from the Etlworks instance for flows executed by the Agent.
It is now possible to bulk load CSV files into the Snowflake from the server (local) storage. Previously it was only possible to bulk load files into the Snowflake from the S3, Azure Blob, or Google Cloud storage. The flow Load files in cloud storage into Snowflake
was renamed to Bulk load files into Snowflake
. Note that it was always possible to ETL files into the Snowflake from server (local) storage.
The flow Bulk load CSV files into the Snowflake now supports loading files by a wildcard pattern in COPY INTO and the ability to handle explicit CDC updates when the CDC stream includes only updated columns.
MySQL CDC connector now supports useCursorFetch property. When this property is enabled the connector is using the cursor-based result set when performing the initial snapshot. The property is disabled by default.
All CDC connectors now test the destination cloud storage connection before attempting to stream the data. If the connection is not properly configured the CDC flow stops with an error.
The Debezium has been upgraded to the latest 1.9 release.
It is now possible to add a description and flow variables to the flows scheduled to run by the Integration Agent. Read about parameterization of the flows executed by Integration Agent.
We have added a new premium Box API connector.
Snowflake, DB2, and AS400 JDBC drivers have been updated to the latest and greatest.
We introduced two major improvements for Change Data Capture (CDC) flows. The previously available mechanism for the ad-hoc snapshots using a read/write signal table in the monitored schema has been completely rewritten.
- It is now possible to add new tables to monitor and snapshot by simply modifying the list of the included tables. Read more.
- It is now possible to trigger the ad-hoc snapshot at runtime using a table in any database (including a completely different database than a database monitored by CDC flow) or a file in any of the supported file storage systems: local, remote, and cloud.
Webhooks now support custom payload templates. The templates can be used to configure integration with many third-party systems, for example, Slack.
We added a ready-to-use integration with Slack. It is now possible to send notifications about various Etlworks events such as flow executed, flow failed, etc., directly to the Slack channel.
The S3 SDK connector now supports automatic pagination when reading files names by a wildcard.
Amazon Marketplace connector now supports Sign in with Amazon
and Selling Partner API (SP-API). MWS API has been deprecated and is no longer available when creating a new connection.
Magento connector now supports authentication with Access token.
Etlworks Integrator now supports Randomization and Anonymization for various domains, such as names, addresses, Internet (including email), IDs, and many others.
We added a new flow type: Bulk load files in S3 into Redshift. Use the Bulk Load Flow when you need to load files in S3 directly into Redshift. This Flow is extremely fast as it does not transform the data.
The Redshift driver now automatically maps columns with a SMALLINT
and TINYINT
data types to INTEGER
. It fixes the issue when Redshift is unable to load data into the SMALLINT
column if the value is larger than 32767
.
CSV connector can now read the gzipped files. It works in Explorer as well.
The connector for fixed-length format can now parse the header and set the length of each field in the file automatically. Read more.
It is now possible to override the default key used for encryption and decryption of the export files.
Users with the operator
role can now browse data and files in Explorer.
It is now possible to override the storage type, the location, the format, whether the files should be gzipped, and the CDC Key set in the CDC connection using TO-parameters in source-to-destination transformation. Read more.
We added a new S3 connector created using the latest AWS SDK. It is now a recommended connector for S3. The old S3 connector was renamed to Legacy
. We will keep the Legacy connector forever for backward compatibility reasons.
It is now possible to see and cancel actions triggered by the end-user to be executed in Integration Agent. When the user triggers any action, such as Run Flow, Stop Flow, Stop Agent the action is added to the queue. The actions in a queue are executed in order on the next communication session between the Agent and the Etlworks Integrator. Read more.
We added an SMB Share connector. Among other things, it supports connecting to the network share over the SSH tunnel.
Google Sheets connector now supports configurable timeout (the default is 3 minutes) and auto-retries when reading data.
The Flow types Extract nested dataset and create staging files
and Extract nested dataset and create staging tables
which are used to normalize nested datasets as relational data model now support message queues as a source.
It is now possible to configure the CDC connection to send records to the specific Kafka or Azure Event Hub partition. Read more.
Legacy MySQL CDC connector now provides information about the current and previous log readers. It is specifically useful when the connector is configured to automatically snapshot new tables added to the Include Tables list.
The Integration Agent is a zero-maintenance, easy-to-configure, fully autonomous ETL engine which runs as a background service behind the company’s firewall. It can be installed on Windows and Linux. The Remote Integration Agent is now fully integrated with the cloud Etlworks instance. You can monitor the Agent in real-time, schedule, run, stop and monitor flows running on-premise. Read how to install, configure, and monitor the new Integration Agent. Read about configuring flows to run in the Integration Agent.
We added a new flow type: Bulk load files into the database without transformation. Use the Bulk Load Flow when you need to load files in the local or cloud storage directly into the database which supports a bulk load. This Flow does not transform the data. Read how to ETL data into databases using bulk load.
Etlworks is now shipped with the latest stable Debezium release (1.8). We support all features introduced in 1.8 and much more. Read about creating Change Data Capture (CDC) flows in Etlworks.
Load files in cloud storage into Snowflake now supports creating all or selected columns as TEXT which mitigates issues caused by the source schema drift. Read more.
It is now possible to create a new Google Sheets spreadsheet if it does not exist. Read more.
Bulk load flows now support splitting large datasets into smaller chunks and loading chunks in parallel threads. Read more about performance optimizations when loading large datasets using bulk load flows.
It is now possible to configure the CSV format to enclose the columns in the header row in double quotes. Previously only values could be enclosed.
Added mapping to the Flow type Load files in cloud storage into Snowflake. It is now possible to globally rename and exclude columns for all tables.
Added new connectors for messages queues:
Added the ability to convert nested objects to strings (stringify) when creating staging tables or files from the nested JSON and XML datasets.
Fixed an error in the Snowflake bulk load flow when the schema name starts with non-SQL characters, for example 123abc
.
Added JavaScript exception handler. It is now possible to execute a program in JavaScript in case of any error.
The POST/PUT listeners can be now configured to not enforce the strict UTF-8 encoding.
The flow can be now configured to fail if the source field in mapping does not actually exist in the source.
Fixed an error when the field in mapping contains trailing or leading spaces.
Added the ability to send email notifications to the configurable emails addresses when the webhook is triggered by the event.
Added programmatic sequence generators which can be used from JavaScript and Python code.
Added the new Flow type: bulk email reader. This flow reads the email messages (including attachments) from the inbound email connection and saves them as files into the designated folder in the server storage. Use this flow type when you need to read hundreds or thousands of emails as fast as possible from the relatively slow inbound email connection.
Added new collision policy when importing previously exported flows: Keep all, replaces Flows and Macros
. Use this policy if you are migrating flows from one environment to another and prefer to keep existing connections and formats in the destination environment.
The Server Storage connection now defaults the Directory to the Home folder (app.data).
MongoDB CDC connector now supports MongoDB change streams.
Added the ability to configure flow so it can not be executed manually. Enable it if the Flow is a part of the nested Flow and is not meant to be executed independently.
Added transformation status column to the flow metrics. It works the best together with the option to Retry failed transformations.
Fixed an issue causing intermediate errors when sending and receiving emails from/to servers with enabled TSL 1.1.
Added the ability to configure the type of SQL executed when merging data using Snowflake and Bulk Load flows. The default option is DELETE/INSERT. It deletes all records in the actual table that also exist in the temp table, then inserts all records from the temp table into the actual table. If this parameter is set to MERGE the flow executes native MERGE SQL.
Added binary format selector for Snowflake flows. The value of this parameter defines the encoding format for binary input or output. This option only applies when loading data into binary columns in a table. Default: HEX.
Added the ability to send a test email and set the FROM when configuring the SMTP for sending email notifications.
Added the ability to configure the number of records when sampling nested data structures. It allows the Explorer and the Mapping to more accurately set the column's data types when working with non-relational datasets, for example, JSON and XML files.
Parquet and Avro connectors now support automatic schema generation. Prior to this update, the developer would need to provide a schema in order to create Parquet or Avro document.
Added support for the following gpload flags used by the Flows optimized for Greenplum.
Added new authentication types (including interactive authentication with Azure Active Directory) for SQL Server connector.
All bulk load flows (Snowflake, Redshift, BigQuery, Greenplum, Azure Synapse, generic Bulk load) now support extra debug logging. When option Log each executed
SQL statement
is enabled the flow will log each executed SQL statement, including Before SQL, After SQL, CREATE and ALTER TABLE, COPY INTO, MERGE, DELETE, etc.
Added Max Latency Date for the CDC metrics.
Added new webhook types:
- Webhook for the event when the flow is stopped manually or by API call.
- Webhook for the event when the flow is stopped by the scheduler because it has been running for too long.
- Webhook for the event when the flow is running too long.
- Webhook for the event when the maintenance task is not able to refresh the access token for the connection configured with Interactive Azure Active Directory authentication.
- Webhook for the event when the flow generates a warning, for example, "source table has more columns than destination table".
Added TABLE_SCHEMA
and TABLE_DB
flow variables for flows which load data from multiple tables matching the wildcard name. These variables can be used in Before and After SQL, as well as Source Query.
Added the ability to create all columns as nullable when creating a database table.
Added the ability to retry failed transformations on the next run.
Added the ability to use S3 authentication with IAM role or default profile.
Added password authentication for Redis.
Added the ability to execute CDC snapshots in parallel threads.
Fixed an edge case when MySQL CDC connection configured with SSH tunnel was not closing the SSH connection when switching from snapshot reader to the binlog reader.
Added support for shared Google Drive. If this option is enabled (default) the connector supports MyDrives and Shared drives. When it is disabled it only supports MyDrives.
Added the ability to configure a database connection to open when needed.
The option to modify the Create table SQL is now available for all bulk load flows.
Added new flow type: Load files in cloud storage into Snowflake. This flow is the most efficient way of loading data into Snowflake when you already have CSV files in the cloud storage (Amazon S3, Google Cloud Storage, Azure Blob) and don't need to transform the data.
Previous numbered releases
Here are the previous numbered releases.
Comments
0 comments
Please sign in to leave a comment.