Version
New
1. Flows optimized for loading data into Google BigQuery
Affected area: Google BigQuery
User request: Improve loading data into Google BigQuery
In this release, we added Flows optimized for loading data into Googe BigQuery.
A typical Flow optimized for BigQuery performs the following operations:
- Extracts data from the source.
- Creates CSV files in the Google Cloud storage.
- Creates an external table using data in cloud storage.
- Loads data from the external table into the actual table using SQL.
- Cleans up the remaining files, if needed.
2. Flows optimized for loading data into Greenplum database
Affected area: Greenplum
User request: Call Greenplum gpload from Etlworks
In this release, we added Flows optimized for loading data into the Greenplum Database.
A typical Flow optimized for Greenplum performs the following operations:
- Extracts data from the source.
- Creates CSV files in the server storage.
- Creates control file and executes Greenplum gpload utility to load data into the Greenplum table.
- Cleans up the remaining files, if needed.
3. Improvements for log-based change replication (CDC)
Affected areas: CDC Flows, CDC Connections
3.1 Upgraded Debezium to 1.6.final
We upgraded the change data capture engine (Debezium) to the latest stable release, 1.6.final. Previously, the Etlworks Integrator was using Debezium 1.4.final.
3.2 Incremental ad-hoc snapshots
In this release, we added the ability to trigger incremental ad-hoc snapshots for existing and new tables.
It covers the following use-cases:
- a new table was added to the list of captured tables.
- some of the topics were deleted and their content needs to be rebuilt.
- the existing table needs to be re-snapshotted.
- the data were corrupted due to a configuration error.
3.3 Ability to monitor CDC Flows in real-time
In this release, we added the ability to monitor CDC Flows in real-time.
Our ETL engine records real-time metrics for each active CDC stream. The following metrics are available:
Last Update(timezone
: the time stamp in the server timezone of the last checkpoint. The metrics are updated every 60 seconds.Last Record Processed (time ago)
: the human-readable timestamp of the last processed CDC event.Last Record Latency
: the last recorded latency. The latency is the difference between a timestamp of the change event in the source database and a timestamp of when the change event was processed by the CDC connector.Max Latency
: the maximum recorded latency since the last restart of the CDC connector.Min Latency
: the minimum recorded latency since the last restart of the CDC connector.Avg Latency
: the average latency since the last restart of the CDC connector.Total Records Processed
: the total number of the change events processed since the last restart of the CDC connector.Records Processed Since Last Update
: the number of the change events processed since the last update. Note that the metrics are updated every 60 seconds.
3.4 Ability to include attributes available in the original CDC event emitted by Debezium
User request: Retain Debezium 'before' element
A typical CDC event emitted by Debezium contains the following elements:
before
: an optional field that specifies the state of the row before the event occurred.after
: an optional field that specifies the state of the row after the event occurred.source
: a mandatory field that describes the source metadata for the event.op
: a mandatory string that describes the type of operation that caused the connector to generate the event.ts_ms
: an optional field that displays the time at which the connector processed the event.transaction
: transaction metadata.schema
: the schema, which describes the structure of the payload.fields
: the JDBC schema.
By default, the connector only retains the state of the event after the update and converts it to the flat JSON. The flat JSON is then added to the message queue.
In this release, we added the ability to include attributes available in the original CDC event emitted by Debezium. When it is enabled, the downstream application can take advantage of the enriched message.
3.5 Insert events from all tables into a single topic
User request: Insert all events into a single topic in the message queue
If you wish to insert all events from all tables into a single topic in a message queue, simply hardcode the CDC Key
. CDC Key = topic name.
3.6. Ability to create subfolders when storing CDC events as files
User request: Create subfolders when saving CDC events in cloud storage
It is now possible to automatically create subfolders when storing CDC events as files.
Use Location of the CDC events
parameter to configure the location of the CDC events serialized as files. If this field is empty, the system will use the default location, which is {app.data}/debezium_data/events
.
When configuring the location, you can use tokens, for example, {app.data}
as a part of the field's value. You can also use any part of the CDC key, specifically [db]/[schema]/[table]
.
NOTE: This field is required when cloud storage is selected:
-
- Amazon S3: bucket/[optional folders].
- Azure Storage: container/[optional folders].
- Google Storage: bucket/[optional folders].
3.7. Database-specific tips and tricks
By default, our CDC connectors are configured to support the most common scenarios.
The less common use-cases include:
- Adding and removing tables.
- Handling binary and geometry data types.
- Handling large transactions.
- Starting steaming from the specific location.
- Edge cases when establishing a Connection for the source database.
Read more about database-specific tips and tricks for less common use cases.
4. Service status dashboard
In this update, we added the service status dashboard for all the Etlworks Integrator instances hosted by Etlworks.
End-users can subscribe to the notifications, including notifications for events when the instance is down or not responding.
5. Enhancements for HWM replication when the source is a wildcard
The Etlworks Integrator supports HWM replication when the source is configured with a wildcard. First, the Flow lists all the objects which names match the wildcard. It then creates a separate source-to-destination transformation for each matching source object: database table file, etc. It applies the same HWM rules to each source-to-destination pair.
In this update, we added the ability to generate the stable transformation name, which is used as a key for HWM values.
Click MAPPING
and select the Parameters
tab. Enter {source}
in the Transformation Name
field. Token {source}
enables a mechanism that sets the transformation name to the fully qualified source table name. Without it, the transformation name will be generated dynamically which will have a negative impact on how Flow locates the stored HWM value. The other available token is {destination}
. You can use both tokens together: {source} to {destination}
. We recommend {source}
.
6. Connector for Google User Activity API
User request: Connector for Google User Activity API
In this update, we added a connector for Google User Activity API, which is a part of the Google Analytics Reporting API.
Read more about this connector.
The connector can be used by any Flow optimized for Google Analytics.
and any Flow where the source is a well-known API.
7. Improvements for nested loops
7.1 Execute loop SQL as script
Affected areas: database loops
User request: Execute as a script in nested SQL loop
It is now possible to execute loop SQL as a script, such as PL/SQL, TSql, etc. Note that the script must return a cursor.
7.3 Retry loop iteration in case of error
Affected areas: execute Flow in a loop
In this update, we added the ability to retry loop iteration in case of an error.
8. Execute SQL script in Explorer
Affected areas: Explorer
User request: Run as a script from Explorer
The SQL script is a DDL statement such as create procedure
, create function
, create trigger
, etc., or an anonymous SQL block that contains one or more SQL statements. For example:
begin
dbms_output.put_line('Hello There');
end;
To execute an SQL script, use hotkey ctrl+F3
or select Execute Script (ctrl+F3)
from the dropdown menu.
9. Execute Before/After SQL as script
Affected areas: source-to-destination transformation
User request: Allow Before/After SQL to be executed as a script
In this update, we added the ability to execute Before and After SQL as a script. The script can be any anonymous SQL block in the dialect supported by the target database, for example, PL/SQL.
10. Custom UPSERT
Affected areas: SQL actions
Some database connectors support the UPSERT SQL clause, which looks similar to INSERT
: UPSERT INTO table (columns) VALUES (values)
. However, the actual syntax of the UPSERT very much differs depending on the driver.
With the CUSTOM
action, the developer can provide a template for UPSERT
which will be used at runtime to generate the actual SQL statement.
Read how to use CUSTOM UPSERT action.
12. Ability to upload X12 and EDIFACT schemas
Affected areas: X12 and EDIFACT Formats
The schemas are used to assign human-readable names to the EDI segments. Etlworks EDI connectors can download schema files from the local storage.
- The default schema for X12 is RSSBus_00401.json. Learn more about X12 schemas at
http://cdn.nsoftware.com/help/BMF/java/X12Translator_p_SchemaFormat.htm
. - The default schema for EDIFACT is RSSBus_D97A.json. Learn more about EDIFACT schemas at
http://cdn.nsoftware.com/help/BNF/java/EDIFACTTranslator_p_SchemaFormat.htm
.
12. Automatic PGP encryption - ASCII armored files
ETL Flows that create files in the server storage can automatically encrypt all created files using the PGP algorithm.
When configuring the Connection to automatically encrypt files, you can enable (default) or disable property ASCII Armored
. Encrypted files may be saved in a Format referred to as ASCII Armored
. This Format is an encrypted representation of a file consisting entirely of printable ASCII (or text-mode only) characters. Files in this Format contain no binary values, and therefore may be easily sent as part of e-mail messages and visually examined. Files saved in this Format are approximately 30% larger than their non-armored counterparts. When decrypted, both armored and non-armored files reproduce an identical original.
13. UI/UX improvements
13.1 Ability to filter events in the Flow execution history
Affected areas: Flow statistics
User request: Add ability to filter Flow executions by status
In this update, we added the ability to filter Flow execution history by:
- Date range
Status
: success, error, warning, etc.,Trigger
: manual, scheduler, API.
To filter Flow execution history, click the filter
icon, select or enter the filter's parameters, and click Apply
.
13.2 Ability to find where the Flow is included in the nested Flow
Affected areas: nested Flow, create and manage Flows
User request: Add an 'included in nested Flow' icon in the Flow list
In the Etlworks Integrator, any Flow can be added as a step to the nested Flow. To find all nested Flows where this Flow is included, click Find Usage
at the bottom of the Flow editor screen:
The system will display a popup with a list of nested Flows which include this specific Flow. You can click on a link to open the nested Flow editor.
If the Flow is included in the nested Flow, we display an icon next to the Flow name which indicates that the Flow has a parent nested Flow. Click on the icon to open the nested Flow editor.
Other improvements
- Other parameters for SFTP Connection.
- Other parameters for S3 Connection.
- Other parameters for Google Storage Connection.
- The ability to set the maximum number of parallel threads and Flow variables without using nested Flows.
- The scheduler now sends an email if the Schedule was automatically disabled due to the error.
- HTTP connector now supports non JSON response from the OAuth2 authentication endpoint.
- SFTP connector now supports connecting to the remote host without the username and password.
- SSH tunnel now sends the keep-alive message to the SSH host every 60 seconds.
Comments
0 comments
Please sign in to leave a comment.