Version
New
1. Improvements for log-based change replication (CDC)
New MySQL binlog client
Affected areas: CDC flows , MySQL CDC connector
We replaced the binlog client used by MySQL CDC connector.
Prior to this release, we used https://github.com/shyiko/mysql-binlog-connector-java. This project is unmaintained for a year now and officially is labeled as deprecated. There's a recommended fork at https://github.com/osheroff/mysql-binlog-connector-java., maintained by Zendesk. This is a drop-in replacement.
Improved handling of the update events in MongoDB
Affected areas: CDC flows , MongoDB CDC connector
Update events in MongoDB’s oplog do not have the before or after states of the changed document, as a result, the CDC events received from the oplog contain only updated fields.
In this update, we added a new option to the MongoDB CDC connector: Reconstruct Record on Update. If it is enabled the connector will reconstruct the entire record by querying MongoDB directly. Note that this parameter is disabled by default.
Improved CDC events serialization
Affected areas: CDC flows , CDC connectors
In this update, we added new configuration options for CDC events serialization:
- Flatten CDC events encoded using Extended JSON format - If this option is enabled the CDC connector will flatten CSV events, encoded using Extended JSON Format (only the following data types are decoded: $oid,$symbol,$numberInt,$numberLong,$numberDouble,$numberDecimal,$code,$date,$minKey,$maxKey,$undefined).
- Convert JSON array to comma-separated list - If this option is enabled the CDC connector will convert JSON arrays to the comma-separated list of strings: [1,2,3]->1,2,3
- Columns in CSV File - The columns specified in this field will be used to create the CSV file. Other columns, except 'extra columns' and debezium_cdc_timestamp/debezium_cdc_op will be ignored. This field is case insensitive - you can enter columns in any CaSe and the system will match them regardless.
- Serialize CDC events as JSON files - If this option is enabled the CDC connector ignores the destination and dumps CDC events into the internal storage as database_table_cdc_stream_uuid.json files
- Close CSV file if header is different - If this option is enabled the CDC connector will close the CSV file and will create a new one if the new header is different from the current header for the file with the same CDC key. Enable it if you expect a different number of columns in the CDC stream for the specific object (typically MongoDB collection).
- Remove EOL characters - if this option is enabled the system will remove end-of-line (EOL) characters from the field's value when creating CSV files.
Improved handling of the timestamp and time fields in CDC events
Affected areas: CDC flows
In this update, we improved the handling of the time and timestamp fields in CDC events serialized as microseconds (not milliseconds) since EPOCH.
2. The ability to recreate the target table in the database to database flow
Affected areas: source-to-destination transformation when the destination is a database
User suggestion: Add recreate target table to database to database flow
In this update, we added a new parameter to the database-to-database flow: MAPPING->Parameters-Alter target table if the source has columns that the target table doesn't have.
If this parameter is enabled (it is disabled by default) and Create target table... parameter is also enabled the system will automatically drop and recreate the destination table (if it exists) if the source table has changed.
This parameter is not available if High Watermark Change replication is enabled.
3. The ability to preserve EOL characters when creating CSV files
Affected areas: CSV format, Fixed Length text format
Prior to this update, the system was always removing the end-of-line (EOL) characters (\n, \r) when creating the CSV and fixed-length files.
In this update, we added a new configuration option for CSV and Fixed-length text formats: Remove EOL characters. If this option is disabled the system will preserve the EOL characters in the file. Note that it is enabled by default.
4. Improvements for run-flow-by-name API
New parameters for run-flow-by-name API to enable console log capturing
Affected areas: run flow by name API
User suggestion: Add extra log step parameter to run flow by name API
In this update, we added new parameters that control console log capturing.
They are equivalent to configuring the Extra Log Step when executing the flow manually or by the scheduler.
New parameters for run-flow-by-name API to enable email notifications
Affected areas: run flow by name API
User suggestion: Add run schedule api endpoint
In this update, we added new parameters that allow configuring the email notifications when executing flow by name using API.
They are equivalent to configuring email notifications when creating a schedule.
4. Improvements for loading data into Snowflake and Redshift
MERGE SQL and CDC MERGE SQL
Affected areas: Snowflake ETL, Amazon Redshift ETL
In this update, we added 2 new parameters to the Snowflake and Redshift-optimized flows that allow a user to configure custom SQL for MERGING data and MERGING CDC events:
- MERGE SQL - this is a user-defined SQL that will be used instead of the default when action is set to
MERGE
. If nothing is entered in this field the default MERGE SQL will be executed. The following parameters are automatically populated and can be referenced as {TOKEN} in the SQL:- {TABLE} - the table to MERGE data into,
- {TEMP_TABLE} - the table to merge data from,
- {KEY_FIELDS} - the fields uniquely identifying the record in both tables,
- {FIELDS} - the fields to INSERT/UPDATE in the table to MERGE data into.
- CDC MERGE SQL - this is a user-defined SQL that will be used instead of the default when action is set to
CDC MERGE
. If nothing is entered in this field the default CDC MERGE SQL will be executed. The following parameters are automatically populated and can be referenced as {TOKEN} in the SQL:- {TABLE} - the table to MERGE data into,
- {TEMP_TABLE} - the table to merge data from,
- {KEY_FIELDS} - the fields uniquely identifying the record in both tables,
- {FIELDS} - the fields to INSERT. Snowflake only.
- {INSERT_FIELDS} - the values of the fields to INSERT,
- {UPDATE_FIELDS} - the fields and values to UPDATE in the format field=value,field=value,
- {MERGE_CONDITION} - the conditions to MATCH records in the table to MERGE data into and the table to MERGE data from in the format table.key=temp_table.key. Snowflake only.
- {UPDATE_CONDITIONS} - the WHERE clause to update existing records. Redshift only.
Additionally, for Snowflake only, we added the option to execute alternative CDC MERGE SQL. By default when the CDC MERGE
action is enabled the flow generates and executes MERGE SQL statement that looks like the following:
MERGE INTO {TABLE} USING
(SELECT * FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY {KEY_FIELDS}
ORDER BY {KEY_FIELDS}, debezium_cdc_timestamp desc)
CDC_ROW_NUM FROM {TEMP_TABLE}
WHERE debezium_cdc_op!='d') WHERE CDC_ROW_NUM = 1)
src ON {MERGE_CONDITION}
WHEN MATCHED THEN UPDATE SET {UPDATE_FIELDS}
WHEN NOT MATCHED THEN INSERT ({FIELDS}) VALUES({INSERT_FIELDS})
When Use INSERT/DELETE instead of MERGE for CDC MERGE action is enabled (it is disabled by default) and theCDC MERGE
action is enabled the system will generate and execute the following SQL statements:
- DELETE all rows from the main table which are in the temp CDC stream table;
- INSERT all latest INSERTs and UPDATES from the temp CDC stream table into the main table;
- DELETE all rows in the main table which are marked for deletion in the temp CDC stream table
We added this option after careful A/B testing. In most cases, the performance of the default MERGE is more than adequate. We discovered that when the destination table is extremely large (billions of rows) the alternative can be faster. Please do your own testing before settling on the default or alternative CDC MERGE SQL.
New configuration options for loading semi-structured data into Snowflake
Affected areas: Snowflake ETL
In this update, we added the following configuration options for the Snowflake-optimized flows. These options allow more flexibility when loading semi-structured data (JSON, XML, AVRO).
- Strip Outer Array or Element - For JSON: boolean that instructs the JSON parser to remove outer brackets. For XML: boolean that specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents.
- Match By Column Name - String that specifies whether to load semi-structured data (JSON, XML, AVRO) into columns in the target table that match corresponding columns represented in the data. For a column to match, the following criteria must be true:
- 1) The column represented in the data must have the exact same name as the column in the table. The copy option supports case sensitivity for column names. Column order does not matter.
- 2) The column in the table must have a data type that is compatible with the values in the column represented in the data. For example, string, number, and Boolean values can all be loaded into a variant column.
5. Improvements for Unzip flow
The ability to unzip GZip archives
Affected areas: Unzip flow
In this update, we added the ability to unzip files in the GZip format.
To unzip files in the GZip format, when creating the Unzip files flow simply select the UnGZip
or UnGZip and Delete
action.
The ability to disable creating subfolders when unzipping ZIP files with folders
Affected areas: Unzip flow
In this update, we added the ability to disable creating subfolders when unzipping ZIP files with folders.
Note that the default behavior is to create the nested folders.
6. New Connectors
Salesforce Pardot connector
In this update, we added a new premium connector for the Salesforce Pardot application.
7. Premium connections now prepopulate the Connect On Open property
Affected areas: premium connectors
All premium CData connections now prepopulate the Connect On Open property.
When set to true, a connection will be made to the app when the connection is opened. This property enables the Test Connection feature.
8. Usability improvements
Buttons for Login with Google, Salesforce and Facebook now follow UX guidelines
In this update, we redesigned authentication screens for Google, Salesforce and Facebook connectors. They now comply with the official UX guidelines from the vendors. We also added the read-only scopes to all connected apps.
Salesforce
Comments
0 comments
Please sign in to leave a comment.