Continuous delivery model
Etlworks uses continuous delivery model. With this model, bug fixes, new features, and enhancements are released as soon as they are ready. The updates are automatically deployed to the individual Etlworks instances on the rolling schedule.
What's New?
Version: 5.2.5
Built-in change data capture (CDC) engine was upgraded to the latest Debezium 2.4. All improvements and bug fixes implemented since Debezium 2.2 final and Debezium 2.3 final were ported to Etlworks CDC engine.
Noticeable new features of the CDC engine:
- Parallel snapshots. Enabling parallel snapshots can improve the performance of the initial load and re-load by a factor of 10x.
- Ad-hoc snapshot switched to blocking mechanizm which significantly improves the performance.
- MongoDB connector now uses cluster URL instead of the list of individual replica sets or shards.
- MongoDB connector now supports ad-hoc snapshots.
- User request. Added CDC connector for AS400. Etlworks AS400 CDC connector is based on community driven open source project Debezium connector for IBMI. It uses the IBM I journal as a source of CDC events. The corrector is currently in betta.
We added configurable Force execution by the scheduler. Read more. The previous behavior was to always force execution which is still the case for schedules created prior to the update. All schedules created after the update disable the flag by default.
Added file path modifiers for the file loop. Read more.
All CDC pipelines where the destination is a cloud data warehouse (such as Snowflake, Redshift) now support Stage connection. The Stage connection is used to set the location of the files created by the connector and loaded by the pipeline into the cloud data warehouse. Perviously it was required to configure the storage and location in CDC connector. Supported Stage locations:
Version: 5.1.1
This is major release. All self-hosted customers are encouraged to update.
Improved upgrade process (Linux installer only)
In this update we introduced an ability to upgrade to the latest (default) or selected version. Read more.
We also added a new cli command which allows users to check which version is currently installed. Read more.
Important changes under the hood
We have upgraded multiple internal Java libraries to the versions which include important security fixes.
We have added an ability to enforce SSL encryption for Redis. It is specifically important when configuring Etlworks to run in multi-node AWS environment with AWS ElasticCache with in-transit encryption (TLS) . Read more.
We have added an ability to decrypt signed PGP messages.
It is now possible to use importPackage without performance penalty. We deprecated importPackage in the previous release but have un-deprecated it in 5.1.1. It is now implemented as an inline JavaScript function instead of part of larger dynamically loaded package (which was causing slowdowns).
New functionality
Excel XLSX connector now supports updating existing spreadsheets. Read more.
We have added OAuth authentication for Dropbox connector. Read more.
We have added an ability to use JavaScript to programmatically set variables for Before/After SQL in bulk load flows. Read more (the link is for Snowflake bulk load flow but it works identically for all other destinations).
We have added Dimension and Metrics filters for Google Analytics 4 connector. Read more.
We have added an ability to backup CDC history and offset files. Read more.
We have added OAuth authentication for OData connector. Read more.
We have added an ability to troubleshoot sending and receiving emails using inbound and outbound email connections. Read more.
Built-in email sender which is used for sending notifications now supports authentication with Microsoft (Office 365) and Google (Gmail). Read more.
We have added an ability to programmatically handle exceptions in nested flows. Read more.
Source-to-destination transformation where the destination is database now supports MERGE on Error exception handler. It allows scenarios when user wants to update record if insert failed.
Important bug fixes
We have fixed bug which was causing duplicates when using ETL with bulk load flows, configured to insert data into the table which does not exist.
We have fixed bug which was preventing the system to record flow and file metrics when executing highly nested flows.
We have fixed a bug which was causing the Snowflake bulk load flow not alter the table under specific edge conditions.
We have fixed bug in inbound and outbound email connectors which was causing striping character "=" from the password.
We have fixed path calculation in SMB Share connector.
We have fixed an error caused by using string when updating Postgres UUID field.
In this update we introduced major changes to embedded scripting engines (JavaScript and Python). Read more.
New functionality
MongoDB connectors now support SQL for extracting data. Read more.
It is now possible to configure email notifications when creating or modifying a schedule for the flow executed by Integration Agent. The email notifications also trigger the webhooks, if configured. The Agent must be updated to support this feature.
Connectors
In this update we added HTTP connectors for Google and Microsoft services which require or support an interactive OAuth2 authentication. Read more.
We added two new premium connectors:
- Google Ads
- Trello
Improvements under the hood
We modified the logic of the Snowflake bulk load flow when the Direct Load
is enabled and Use COPY INTO with a wildcard pattern
is disabled. The change is designed to better handle the situation when the source and the destination schemas are different.
Connectors
We have added free inbound (IMAP) and outbound (SMTP) email connectors for Office 365 (Exchange Online) and Gmail. All connectors use OAuth2 for authentication.
Improvements
We have improved the global search based on feedback from users. It now uses an algorithm that produces more accurate results based on a sequence of words in a search string. We also now display the relevancy score in the search results.
MySQL, SQL Server, Oracle and DB2 CDC connectors now automatically enable property
schema.history.internal.store.only.captured.tables.ddl
. It reduces the startup when the CDC connector needs to read a schema or database with a large number of tables (thousands). We were previously enabling property database.history.store.only.monitored.tables.ddl
which has been removed in Debezium 2.0.
New functionality
All flows optimized for Amazon Redshift now support native MERGE, which is currently in preview.
Improvements under the hood
Flows optimized for Snowflake and BigQuery now use native MERGE when Action
is set to MERGE
or CDC MERGE
. Before this update, the How to Merge
was set to DELELE/INSERT
by default.
Flows optimized for Snowflake are no longer querying the Snowflake to look up the primary key in the Snowflake table when Predict Lookup Fields
is enabled. It should improve the performance of flows which execute MERGE when the source table doesn't have a primary key or a unique index.
MongoDB extract flow now supports exacting by a wildcard.
Changes
The MongoDB
connector has been renamed to MongoDB document
. The MongoDB
streaming
connector has been renamed to the MongoDB
. Read more.
New functionality
We have significantly improved support for SQL Server Change Tracking. It is fully automated and no longer requires manually configuring SQL to capture changes. Read more.
The new configuration option String used to convert to SQL NULL
has been added to all Redshift flows. It allows configuring a string that is used for SQL NULL values. Setting this option is highly recommended for flows that stream CDC events into Redshift.
UX improvements
User request. We have added a global search. Read more.
Searchable attributes
- Object title
- Description
- Tags
- All user-editable fields
- Code
- Macros
Objects included in the Search results
Improvements under the hood
We improved the handling of the global variables, referenced as {tokens} in connections and transformations. Specifically, it is now possible to use multiple global vars in the same attribute. Example: {folder}/{filename}
.
Security patch
We have fixed two security vulnerabilities found in third-party libraries used by Etlworks. Read more.
Platforms
Etlworks now runs in Docker. Here is a link to the Etlworks image in Docker Hub. Our Docker image natively supports Intel and ARM64 silicons, meaning Etlworks now officially runs on Macs with M1 and M2 processors and Windows and Linux computers with ARM-based processors.
Our Windows installers for Etlworks and Integration Agent are now signed by the new software signing certificate, which should prevent the "Unverified vendor" warning when installing or updating the software.
Connectors
We have added a new read/write EDI connector that supports the majority of the EDI dialects, including X12, EDIFACT, NCPD, HL7, and VDA. Read more.
Our Excel XLS and XLSX connectors now support reading the worksheet names. It works in flows and Explorer. We also added an ability to read Excel worksheets that don't have a dedicated row for column names.
Important bug fixes and improvements under the hood
We have fixed a conversion issue when CDC connectors read zoned timestamps serialized as yyyy-MM-dd'T'HH:mm:ss.SS'Z'
.
We have fixed an issue with multipart upload into S3 when the software runs on Mac silicon.
We have fixed an issue with the "Database to Snowflake" flow type, preventing loading data from the S3 stage with subfolders.
We have fixed an issue preventing a DropBox connector from deleting files in folders other than the root folder.
We have improved error handling by CDC connectors.
February 28, 2023
Connectors
In this release, we have added two new connectors:
We also added the ability to connect to the sandbox account when using Salesforce with the OAuth2 connector.
Streaming from message queues
In this release, we have added the ability to stream real-time data from Kafka and Azure Event Hubs to practically any destination. Before this update, Etlworks only supported extracting data from queues in micro-batches.
New functionality
We significantly improved the streaming of the CDC events from the message queues to any supported destination:
- Streaming CDC events that were ingested by Etlworks CDC connector.
- Streaming CDC events that were ingested by standalone Debezium.
We have added preprocessors to Kafka and Azure Event Hubs connectors:
- Consumer preprocessor - use this preprocessor to change the message streamed from the queue.
- Producer preprocessor - use this preprocessor to modify the message added to the topic.
We have added the Posptocessor to the HTTP connector. The Postprocessor can be used to change the response content programmatically.
We have added a new configuration option to the CSV format, which allows reading CSV files with non-standard BOM characters.
We have added an option to automatically add an Excel worksheet name to the file name in Explorer and Mapping. It simplifies working with Excel files which include multiple worksheets. Read more.
We have added an option that allows the Excel connector to read data from worksheets that don't have a formal "columns" row.
We have added the ability to start the flow as a Daemon. Read more.
Important bug fixes and improvements under the hood
We improved the performance of the S3 SDK connector when reading the list of files available in the bucket.
We have fixed the edge case when the CDC Binlog reader was not disconnecting on error.
We have fixed the issue with using bind variables for columns with UUID data type.
Self-managed on-premises installers
Etlworks is a cloud-native application that works perfectly well when installed on-premises. Prior to this update, your would need to contact Etlworks support in order to receive a link to download an installer and a unique license generated for your organization.
In this update, we introduced a self-managed web flow that allows you to create a company account and download a fully automated installer for Linux or/and Windows. The installer includes a unique license generated for your organization. You can use the same installer to upgrade Etlworks Integrator to the latest version.
Supported operating systems are Amazon Linux 2, Ubuntu 18.04, Ubuntu 20.04, CentOS 7, Red Hat 7, Red Hat 9, Windows Server (2012-2022), and all editions of Windows Windows 10 and Windows 11.
Windows installer
It was always possible to run Etlworks on Windows. Still, unlike running Etlworks on Linux, it required manual installation of all components needed to run Etlworks, such as Java, Tomcat, Postgres, and Redis. In this release, we have added official support for all modern server and desktop versions of Windows.
- Install Eltworks Integrator on Windows.
- Automatically update Etlworks Integrator installed on Windows.
Connectors
In this release, we have added eleven new connectors:
- Clickhouse.
- Excel as database (premium).
- Microsoft Dynamics 365 (premium). This connector supports the following Dynamics editions: Sales, Customer Service, Field Service, Fin Ops, Human Resources, Marketing, Project Operations.
- Oracle Cloud CSM (premium).
- Oracle Cloud ERP (premium).
- Oracle Cloud HCM (premium).
- Oracle Cloud Sales (premium).
- Monday (premium).
- JDBC-ODBC bridge (premium). This connector allows you to access ODBC data sources from Etlworks.
- FHIR as database (premium).
- GraphQL (premium).
We have also updated the following existing connectors:
- Upgraded Google BigQuery JDBC driver to the latest available from Google.
- Added ability to login with Azure quest user to SQL Server connector. Read more.
- Added API Token and Basic Authentication to premium Jira and Jira Service Desk connectors.
Upgraded CDC engine
We have upgraded our CDC engine (Debezium) from 1.9 to the latest 2.1.
New functionality
- User request. It is now possible to add named connections to all source-to-destination flows. Read more.
- It is now possible to override the command which executes the Greenplum gpload utility. Read more.
- It is now possible to connect to a read-only Oracle database when streaming data using CDC. Read more.
- We have added more configuration options for CSV and JSON files created by CDC flows. Read more.
- We have improved logging for CDC connectors when capturing transaction markers is enabled. Read more.
- We have improved logging for loops by adding begin/end markers.
Important bug fixes
- SSO JWT expiration is now the same as regular JWT expiration (which is configurable by end-users). Before this fix, customers with enabled SSO were experiencing frequent logouts under certain conditions.
- We fixed an issue with FTPS connector, which was unable not connect if FTPS server was running behind the load balancer or proxy.
- We fixed an edge case when AWS credentials were exposed in the flow log when the Snowflake flow failed to create the Snowflake stage automatically.
UX improvements
It is now possible to quickly create Connections, Listeners, Formats, Flows, Schedules, Agents, Users, Tenants, and Webhooks from anywhere within the Etlworks UI without switching to a different window. Read more.
New functionality
We significantly improved support for PGP encryption:
- It is now possible to generate a pair of PGP keys using a designated flow type. Read more.
- All Etlworks file storage connectors now support automatic decryption of the encrypted files during ETL operations. Read more.
We improved the mapping when working with nested datasets. It now supports the case when the source is a nested document, but you only need data from the specific dimension. Read more.
Connectors
We added OAuth authentication (Sign in with Microsoft) to our Sharepoint storage and OneDrive for Business connectors.
We added a Stripe premium connector.
We upgraded the built-in SQLite database from version 3.34.0.0 to the latest version 3.40.0.0. SQLite is used as a temporary staging db. Read more about SQLite releases.
Documentation
We completely rewrote a section of the documentation related to working with nested datasets.
UX improvements
It is now possible to create Connections, Formats, and Listeners right in the Flow editor without switching to the Connections window. Read more.
New functionality
Etlworks now supports Vertica as a first-class destination and as a source. Read more.
We have added point-to-point Change Data Capture (CDC) flows for multiple destinations. After this update, you can create a CDC pipeline using a single flow instead of separate extract and load flows.
- Change Data Capture (CDC) data into Snowflake.
- Change Data Capture (CDC) data into Amazon Redshift.
- Change Data Capture (CDC) data into BigQuery.
- Change Data Capture (CDC) data into Synapse Analytics.
- Change Data Capture (CDC) data into Vertica.
- Change Data Capture (CDC) data into Greenplum.
- Change Data Capture (CDC) data into any relational databases.
- Change Data Capture (CDC) data into any relational databases using bulk load.
We have added bulk load flows for several analytical databases:
- Bulk load files in Google Cloud Storage into BigQuery
- Bulk load files into Vertica
- Bulk load files in server storage into Greenplum
All flows optimized for Snowflake now support the automatic creation of the internal stage and external stage on AWS S3 and Azure Blob. Read more.
Changes under the hood
We improved the reliability of the message queue in the multi-node environment.
This is a required update.
We have fixed the memory leak in the Amazon S3 SDK connector. We also fixed a similar memory leak in AWS-specific connectors (Kinesis, SQS, RebitMQ, ActiveMQ) which use IAM role authentication. All instances managed by Etlworks have been upgraded. Self-hosed customers are highly advised to upgrade as soon as possible. Customers which have Integration Agents are encouraged to update the agents as well.
UX improvements
It is now possible to manage the Etlworks billing account and subscriptions from the Etlworks app. Read more.
It is now possible to access this changelog from the Etlworks app. Read more.
It is now possible to search for information in the Documentation and submit support requests from the Etlworks app. Read more.
It is now possible to resize all split panels (such as in Explorer, Connections, etc.) further to the right. It allows users to see long(er) filenames, connection names, and other objects.
We have added links to the running flows in the Suspend flow executions window.
We now display a warning message when a user is trying to create a non-optimized flow when the destination is Snowflake, Amazon Redshift, Synapse Analytics, Google BigQuery, or Greenplum. The warning message includes a link to the relevant article in the documentation.
New functionality
We have added a new flow type that can be used to create dynamic workflows which change based on user-provided parameters. Read more.
It is now possible to enter secure parameters (passwords, auth tokens, etc.) when adding parameters for running flows manually, by the scheduler, and by Integration Agent.
It is now possible to split CSV files using a user-defined delimiter or regular expression. Read more.
We have added the following new configuration options for the SMB share connector: SMB Dialect
, DSF Namespace
, Multi Protocol Negotiation
and Signing Required
. Read more.
CDC flows now never stop automatically unless they stopped manually or fail. Read more. Note that the behavior of the previously created CDC flows did not change.
We have improved the algorithm for creating the transaction markers by CDC flows. They now use the actual start/commit/rollback events emitted by the source database. Previously we were using the change of the transaction id as a trigger. It was creating a situation where the flow was waiting for a new transaction to start before creating an "end of previous transaction" event.
It is now possible to use flow variables as {parameters} in transformations and connections. Previously only global variables could be used to parameterize transformations and connections.
It is now possible to change the automatically generated wildcard pattern when bulk-loading files by the wildcard into the Snowflake and Synapse Analytics.
User request. A new option has been added to the Synapse Analytics bulk load flow which allows creating a new database connection for loading data into each Synapse table. Read more.
We have updated the HubSpot connector which now supports new authorization scopes introduced by HubSpot in August.
Various bug fixes and performance improvements under the hood.
Documentation
We have added new tutorials for creating CDC pipelines for loading data into Snowflake, Amazon Redshift, Azure Synapse Analytics, Google BigQuery and Greenplum. Read more.
Our redesigned main website (https://etlworks.com) went live.
MySQL CDC connector now supports reading data from the compressed binlog. Read more.
It is now possible to disable flashback queries when configuring the Oracle CDC connection. This could greatly improve the performance of the snapshot in some environments. Read more.
CDC connectors can now be configured to capture NOT NULL
constraints. Read more.
Legacy S3 and Azure Storage connectors have been deprecated. The existing legacy connections will continue to work indefinitely but new connections can only be created using S3 SDK and Azure Storage SDK connectors.
Bulk load flows are now ignoring the empty data files.
Bulk load flows which are loading data from Azure Storage now support traversing all subfolders under the root folder.
The flows that extract data from the nested datasets and create staging tables or files can be now configured to not create tables for dimensions converted to strings. Read more.
User request. It is now possible to add record headers when configuring Kafka and Azure Events Hubs connections. Record headers are key-value pairs that give you the ability to add some metadata about the record, without adding any extra information to the record itself.
The BigQuery connector now maps the ARRAY data type in the source database (for example Postgres) to STRING in BigQuery.
Fixed bug which was causing a recoverable NullPointerException (NPE) when saving flow execution metrics.
Various bug fixes and performance improvements under the hood.
Single Sign On (SSO) is now available to all Etlworks Enterprise and On-Premise customers. Read more.
We have added a bulk load flow for loading CSV and Parquet files in Azure Storage into Azure Synapse Analytics. It provides the most efficient way of loading files into Synapse Analytics. Read more.
We have optimized loading data from MongoDB into relational databases and data warehouses such as Snowflake, Amazon Redshift, and Azure Synapse Analytics. It is now possible to preserve the nested nodes in the documents stored in MongoDB in the stringified JSON format. Read more.
The Flow Bulk load files into Snowflake
now supports loading data in JSON, Parquet, and Avro files directly into the Variant
column in Snowflake. Read more.
The Override CREATE TABLE using JavaScript now supports ALTER TABLE as well. Read more.
It is now possible to connect to Snowflake using External OAuth with Azure Active Directory. Read more.
The Azure Events Hubs connector now supports compression. Read more.
The Flows Executions Dashboard now displays the aggregated number of records processed by the specific flow on the selected day. It could be useful when monitoring a number of records processed by the CDC pipeline, which typically includes 2 independent flows (each with its own record tracking capabilities): extract and load.
We have improved the Flow which creates staging tables or flat files for each dimension of the nested dataset. It is now possible to alter the staging tables on the fly to compensate for the variable number of columns in the source. We have also added the ability to add a column to each staging table/file that contains the parent node name. Read more.
It is now possible to authenticate with SAS token and Client Secret when connecting to Azure Storage using the new Azure Storage SDK connector. Note that the legacy Azure Storage connector also supports authentication with SAS token but does not support Client Secret.
We have updated the Sybase JDBC driver to the latest version.
It is now possible to use global variables when configuring parameters for split file flows.
We have fixed the soft deletes with CDC. This functionality was broken in one of the previous builds.
User request. It is now possible to override the default Create Table SQL generated by the flow.
User request. The Flow Executions dashboard under the Account dashboard now includes stats for flows executed by the Integration Agent.
User request. It is now possible to use global and flow variables in the native SQL used to calculate the field's value in the mapping.
It is now possible to filter flows associated with the Agent by name, description, and tags.
It is now possible to configure and send email notifications from the Etlworks instance for flows executed by the Agent.
It is now possible to bulk load CSV files into the Snowflake from the server (local) storage. Previously it was only possible to bulk load files into the Snowflake from the S3, Azure Blob, or Google Cloud storage. The flow Load files in cloud storage into Snowflake
was renamed to Bulk load files into Snowflake
. Note that it was always possible to ETL files into the Snowflake from server (local) storage.
The flow Bulk load CSV files into the Snowflake now supports loading files by a wildcard pattern in COPY INTO and the ability to handle explicit CDC updates when the CDC stream includes only updated columns.
MySQL CDC connector now supports useCursorFetch property. When this property is enabled the connector is using the cursor-based result set when performing the initial snapshot. The property is disabled by default.
All CDC connectors now test the destination cloud storage connection before attempting to stream the data. If the connection is not properly configured the CDC flow stops with an error.
The Debezium has been upgraded to the latest 1.9 release.
It is now possible to add a description and flow variables to the flows scheduled to run by the Integration Agent. Read about parameterization of the flows executed by Integration Agent.
We have added a new premium Box API connector.
Snowflake, DB2, and AS400 JDBC drivers have been updated to the latest and greatest.
We introduced two major improvements for Change Data Capture (CDC) flows. The previously available mechanism for the ad-hoc snapshots using a read/write signal table in the monitored schema has been completely rewritten.
- It is now possible to add new tables to monitor and snapshot by simply modifying the list of the included tables. Read more.
- It is now possible to trigger the ad-hoc snapshot at runtime using a table in any database (including a completely different database than a database monitored by CDC flow) or a file in any of the supported file storage systems: local, remote, and cloud.
Webhooks now support custom payload templates. The templates can be used to configure integration with many third-party systems, for example, Slack.
We added a ready-to-use integration with Slack. It is now possible to send notifications about various Etlworks events such as flow executed, flow failed, etc., directly to the Slack channel.
The S3 SDK connector now supports automatic pagination when reading files names by a wildcard.
Amazon Marketplace connector now supports Sign in with Amazon
and Selling Partner API (SP-API). MWS API has been deprecated and is no longer available when creating a new connection.
Magento connector now supports authentication with Access token.
Etlworks Integrator now supports Randomization and Anonymization for various domains, such as names, addresses, Internet (including email), IDs, and many others.
We added a new flow type: Bulk load files in S3 into Redshift. Use the Bulk Load Flow when you need to load files in S3 directly into Redshift. This Flow is extremely fast as it does not transform the data.
The Redshift driver now automatically maps columns with a SMALLINT
and TINYINT
data types to INTEGER
. It fixes the issue when Redshift is unable to load data into the SMALLINT
column if the value is larger than 32767
.
CSV connector can now read the gzipped files. It works in Explorer as well.
The connector for fixed-length format can now parse the header and set the length of each field in the file automatically. Read more.
It is now possible to override the default key used for encryption and decryption of the export files.
Users with the operator
role can now browse data and files in Explorer.
It is now possible to override the storage type, the location, the format, whether the files should be gzipped, and the CDC Key set in the CDC connection using TO-parameters in source-to-destination transformation. Read more.
We added a new S3 connector created using the latest AWS SDK. It is now a recommended connector for S3. The old S3 connector was renamed to Legacy
. We will keep the Legacy connector forever for backward compatibility reasons.
It is now possible to see and cancel actions triggered by the end-user to be executed in Integration Agent. When the user triggers any action, such as Run Flow, Stop Flow, Stop Agent the action is added to the queue. The actions in a queue are executed in order on the next communication session between the Agent and the Etlworks Integrator. Read more.
We added an SMB Share connector. Among other things, it supports connecting to the network share over the SSH tunnel.
Google Sheets connector now supports configurable timeout (the default is 3 minutes) and auto-retries when reading data.
The Flow types Extract nested dataset and create staging files
and Extract nested dataset and create staging tables
which are used to normalize nested datasets as relational data model now support message queues as a source.
It is now possible to configure the CDC connection to send records to the specific Kafka or Azure Event Hub partition. Read more.
Legacy MySQL CDC connector now provides information about the current and previous log readers. It is specifically useful when the connector is configured to automatically snapshot new tables added to the Include Tables list.
The Integration Agent is a zero-maintenance, easy-to-configure, fully autonomous ETL engine which runs as a background service behind the company’s firewall. It can be installed on Windows and Linux. The Remote Integration Agent is now fully integrated with the cloud Etlworks instance. You can monitor the Agent in real-time, schedule, run, stop and monitor flows running on-premise. Read how to install, configure, and monitor the new Integration Agent. Read about configuring flows to run in the Integration Agent.
We added a new flow type: Bulk load files into the database without transformation. Use the Bulk Load Flow when you need to load files in the local or cloud storage directly into the database which supports a bulk load. This Flow does not transform the data. Read how to ETL data into databases using bulk load.
Etlworks is now shipped with the latest stable Debezium release (1.8). We support all features introduced in 1.8 and much more. Read about creating Change Data Capture (CDC) flows in Etlworks.
Load files in cloud storage into Snowflake now supports creating all or selected columns as TEXT which mitigates issues caused by the source schema drift. Read more.
It is now possible to create a new Google Sheets spreadsheet if it does not exist. Read more.
Bulk load flows now support splitting large datasets into smaller chunks and loading chunks in parallel threads. Read more about performance optimizations when loading large datasets using bulk load flows.
It is now possible to configure the CSV format to enclose the columns in the header row in double quotes. Previously only values could be enclosed.
Added mapping to the Flow type Load files in cloud storage into Snowflake. It is now possible to globally rename and exclude columns for all tables.
Added new connectors for messages queues:
Added the ability to convert nested objects to strings (stringify) when creating staging tables or files from the nested JSON and XML datasets.
Fixed an error in the Snowflake bulk load flow when the schema name starts with non-SQL characters, for example 123abc
.
Added JavaScript exception handler. It is now possible to execute a program in JavaScript in case of any error.
The POST/PUT listeners can be now configured to not enforce the strict UTF-8 encoding.
The flow can be now configured to fail if the source field in mapping does not actually exist in the source.
Fixed an error when the field in mapping contains trailing or leading spaces.
Added the ability to send email notifications to the configurable emails addresses when the webhook is triggered by the event.
Added programmatic sequence generators which can be used from JavaScript and Python code.
Added the new Flow type: bulk email reader. This flow reads the email messages (including attachments) from the inbound email connection and saves them as files into the designated folder in the server storage. Use this flow type when you need to read hundreds or thousands of emails as fast as possible from the relatively slow inbound email connection.
Added new collision policy when importing previously exported flows: Keep all, replaces Flows and Macros
. Use this policy if you are migrating flows from one environment to another and prefer to keep existing connections and formats in the destination environment.
The Server Storage connection now defaults the Directory to the Home folder (app.data).
MongoDB CDC connector now supports MongoDB change streams.
Added the ability to configure flow so it can not be executed manually. Enable it if the Flow is a part of the nested Flow and is not meant to be executed independently.
Added transformation status column to the flow metrics. It works the best together with the option to Retry failed transformations.
Fixed an issue causing intermediate errors when sending and receiving emails from/to servers with enabled TSL 1.1.
Added the ability to configure the type of SQL executed when merging data using Snowflake and Bulk Load flows. The default option is DELETE/INSERT. It deletes all records in the actual table that also exist in the temp table, then inserts all records from the temp table into the actual table. If this parameter is set to MERGE the flow executes native MERGE SQL.
Added binary format selector for Snowflake flows. The value of this parameter defines the encoding format for binary input or output. This option only applies when loading data into binary columns in a table. Default: HEX.
Added the ability to send a test email and set the FROM when configuring the SMTP for sending email notifications.
Added the ability to configure the number of records when sampling nested data structures. It allows the Explorer and the Mapping to more accurately set the column's data types when working with non-relational datasets, for example, JSON and XML files.
Parquet and Avro connectors now support automatic schema generation. Prior to this update, the developer would need to provide a schema in order to create Parquet or Avro document.
Added support for the following gpload flags used by the Flows optimized for Greenplum.
Added new authentication types (including interactive authentication with Azure Active Directory) for SQL Server connector.
All bulk load flows (Snowflake, Redshift, BigQuery, Greenplum, Azure Synapse, generic Bulk load) now support extra debug logging. When option Log each executed
SQL statement
is enabled the flow will log each executed SQL statement, including Before SQL, After SQL, CREATE and ALTER TABLE, COPY INTO, MERGE, DELETE, etc.
Added Max Latency Date for the CDC metrics.
Added new webhook types:
- Webhook for the event when the flow is stopped manually or by API call.
- Webhook for the event when the flow is stopped by the scheduler because it has been running for too long.
- Webhook for the event when the flow is running too long.
- Webhook for the event when the maintenance task is not able to refresh the access token for the connection configured with Interactive Azure Active Directory authentication.
- Webhook for the event when the flow generates a warning, for example, "source table has more columns than destination table".
Added TABLE_SCHEMA
and TABLE_DB
flow variables for flows which load data from multiple tables matching the wildcard name. These variables can be used in Before and After SQL, as well as Source Query.
Added the ability to create all columns as nullable when creating a database table.
Added the ability to retry failed transformations on the next run.
Added the ability to use S3 authentication with IAM role or default profile.
Added password authentication for Redis.
Added the ability to execute CDC snapshots in parallel threads.
Fixed an edge case when MySQL CDC connection configured with SSH tunnel was not closing the SSH connection when switching from snapshot reader to the binlog reader.
Added support for shared Google Drive. If this option is enabled (default) the connector supports MyDrives and Shared drives. When it is disabled it only supports MyDrives.
Added the ability to configure a database connection to open when needed.
The option to modify the Create table SQL is now available for all bulk load flows.
Added new flow type: Load files in cloud storage into Snowflake. This flow is the most efficient way of loading data into Snowflake when you already have CSV files in the cloud storage (Amazon S3, Google Cloud Storage, Azure Blob) and don't need to transform the data.
Comments
0 comments
Please sign in to leave a comment.