Version
New
1. Improvements for log-based change replication (CDC)
1.1 Upgraded Debezium to 1.4
Affected areas: CDC Flows, CDC Connections
We upgraded the change data capture engine (Debezium) to the latest pre-release version, 1.4 CR1. Previously Etlworks Integrator was using Debezium 1.3.
1.2 The ability to create files in the cloud storage
Affected area: CDC Flows
Prior to this update, the CDC Flows were creating files in the local (server) storage.
In this update, we added the ability to configure the CDC Connection to create files directly in any of the following cloud storage services:
- Amazon S3
- Azure Storage
- Google Cloud Storage
When files are created in a cloud, it opens the possibility of using serverless data pipelines for loading data into the databases and data warehouses. For example, read about loading data continuously using Snowflake Snopilies.
The other advantage of storing files directly in a cloud is eliminating the step that requires copying files to the cloud in the first place. For example, Amazon Redshift (or any RDS database) can bulk-load files from the S3. When files are already in S3, there is no need to create an additional Flow to move files from the local storage to S3. The same is true for Azure databases and data warehouses; for example, Azure Synapse Analytics can bulk load files from the Azure storage, so creating the files in the Azure storage makes the pipeline much faster by eliminating a step that moves the files from the local storage to Azure.
Read how to configure the storage, other than local (server), when creating a CDC Connection.
1.3 The ability to create files using the specified character encoding
Affected area: CDC Flows
If the source database has tables with data encoded using any character set other than ASCII (for example, UTF-8
), you can configure the connector to create files using that specific encoding.
1.4 The ability to preserve the exact data types of the columns when creating a destination table
Affected area: CDC Flows
Etlworks Integrator can automatically create a destination table when loading CSV files. By default, the Flow will sample the file and set the data types of the columns in the destination table. This mechanism is not 100% accurate, so the destination table might end up with columns with different data types compared to the source table.
In this update, we added the ability to preserve the exact data types of the columns when creating a destination table. Read more.
In addition, when enabled, this feature allows the CDC load Flow to automatically set the Lookup Fields
required for the CDC MERGE
by extracting the information about the primary keys from the source database. It significantly enhances the accuracy of the Predict Lookup Fields
feature. Basically, it makes it bulletproof.
1.5 The ability to implement soft deletes with CDC Flows
Affected area: CDC Flows
By default, CDC Flow creates, updates, and deletes records in the destination table based on the type of the CDC event: c
for create, u
for update, and d
for delete.
In this update, we added the ability to implement soft deletes as well. Read more.
1.6The ability to add calculated columns to the CDC stream
Affected areas: CDC Flows
In this update, we added new functions that can be used to add calculated columns to the CDC stream. Read more.
1.7 The Snowflake CDC MERGE now includes DELETE
Affected areas: CDC Flows, load data into Snowflake
Prior to this update, the Snowflake CDC MERGE
was executing the MERGE
with INSERT
, and UPDATE
in one statement and separately, DELETE
. It was causing issues when the record with the same ID was added, deleted, and added again in the same CDC batch.
In this update, we added DELETE
to the MERGE
which now looks like below:
merge into {TABLE}
using (
select *
from (select *, row_number() over (partition by {KEY_FIELDS}
order by {KEY_FIELDS}, debezium_cdc_timestamp desc) as row_num
from {TEMP_TABLE})
where row_num = 1) src on {MERGE_CONDITION}
when matched and src.debezium_cdc_op = 'd' then delete
when matched and src.debezium_cdc_op != 'd' then update set {UPDATE_FIELDS}
when not matched and src.debezium_cdc_op != 'd' then
insert ({FIELDS}) values ({INSERT_FIELDS})
To keep it backward compatible with a previous implementation, we added a new configuration parameter, enabled by default:
2. Support for Azure Synapse Analytics
Affected areas: Working with Azure Synapse Analytics, CDC Flows
Azure Synapse Analytics is a limitless analytics service that brings together enterprise data warehousing and big data analytics.
In this update, we added support for the Azure Synapse Analytics. It is now a first-class source and destination in Etlworks Integrator.
- Getting started with Azure Synapse Analytics
- Extract, transform and load data in Azure Synapse Analytics
- ELT with Synapse Analytics
- Working with Azure Synapse Analytics as a relational database
3. The ability to bulk-load data into any database
Affected area: bulk load data into a database
A Bulk Load is a process or method provided by a database management system to load multiple rows of data from the source file into a database table.
Usually, bulk operations are not logged, and transactional integrity is not enforced. Often bulk operations bypass triggers and integrity checks. This improves performance for loading large amounts of data quite significantly.
Prior to this update, Etlworks Integrator supported bulk-loading data only into the Snowflake and Amazon Redshift.
In this update, we added the ability to bulk-load data into any database that supports bulk-loading. Read more.
4. The ability to preserve metadata when extracting data from a database and creating the CSV files
Affected area: CSV Format
When Etlworks Integrator Flow creates a CSV file, by default, the information about the column's data types in the source database is getting lost. So if another Flow reads the same file, it assigns the data types to the columns by sampling the data, which is not always 100% accurate.
In this update, we added the ability to preserve the metadata, along with the data. Read more.
5. The ability to execute SQL scripts in Source query
When the source Connection is a relational database that supports stored procedures and anonymous SQL blocks, it is now possible to run SQL scripts (with variables, conditions, loops, etc.) as a part of the Source query
. The only condition is that the script must return a cursor.
Enter the script in Source Query and enable the Execute as script
parameter.
6. The ability to create a new source Connection for each source-to-destination pair
Affected area: source-to-destination transformation
Prior to this update, Etlworks Integrator Flows, where the source is a database, were reusing the same source Connection for each source-to-destination pair.
In this update, we added the ability to dynamically create a new physical database Connection even when the same source Connection is configured for each pair. It does include loading data from multiple tables using the wildcard source name.
The new parameter can be found under transformation
> MAPPING
> Parameters
. It is disabled by default.
7. The ability to use staging instead of temporary tables when loading and merging data into Snowflake
Affected area: load data into Snowflake
Prior to this update, the Flows that load data into the Snowflake and MERGE
it with the existing data were using Snowflake temporary tables:
Step 1. Create a temp table.
Step 2. Execute COPY INTO
the temp table.
Step 3. Generate and execute MERGE
from the temp table into the actual table.
This approach requires creating a temp table on each run of the Flow that loads data into the Snowflake.
In this update, we added the ability to configure Flow to use staging (persisted) tables instead of temp. This option is disabled by default.
If this option is enabled, the Flow will:
Step 1. Create a staging table
if it does not exist, otherwise, TRUNCATE
the staging table
.
Step 2. Execute COPY INTO
the staging table
.
Step 3. Generate and execute MERGE
from the staging table
into the actual table.
This approach is not faster by any stretch, but it can save a few Snowflake cloud credits.
8. Upgraded MongoDB client to 4.1.1
Affected areas: MongoDB connector, CDC Flows
In this update, we upgrade MongoDB to the latest and greatest stable version, 4.1.1. It affects the regular and CDC MongoDB connector.
9. Added new Flow variables available in the Source query
In this update, we added new Flow variables and made them available in the Source query
:
TARGET_TABLE_NAME
: the destination table name.TARGET_TABLE_NAME_NO_QUOTES
: the destination table name without quotes.-
FILE_TO_LOAD
: the file to load when processing files by a wildcard name.
These variables are only available when the source and the destination are configured using wildcards.
Fixed
1. HTTP Connection leak in Copy Files Flow
Affected areas: working with Web services, Copy Files Flow
We fixed the HTTP Connection leak when the Connection was configured using PUT
/ POST
/ DELETE
/ HEADER
method is used as a destination in the Copy Files Flow, and the endpoint returns a response with a content.
2. Issue with the continuous run Schedules
Affected area: scheduling Flow to run continuously
Prior to this release, when the Flow was scheduled to run continuously and the Etlworks Integrator instance was rebooted, the Schedule was starting from scratch. For example, if the Flow was configured to run every 3 hours, the instance was bounced, and there were 30 minutes remaining until the Flow should resume; the scheduler was losing this information and was starting the Flow 3 hours after the reboot, thus missing the next execution slot.
We fixed the issue in this release.
3. The full stack trace is now displayed in the Test transformation
Affected areas: Scripting, Testing transformation
Prior to this release, if there was a JavaScript or Python error when executing the Test transformation, the UI was only displaying a part of the error, which in many cases didn't include any useful information about the error.
Starting with this release, we are displaying a full exception stack trace.
Comments
0 comments
Please sign in to leave a comment.