In Etlworks, it is possible to ETL data from any source into Oracle using the sql*loader utility. This method might provide better performance compared to conventional ETL processes.
When to use this Flow type
Use this flow type when you want to ETL data from any source into Oracle, leveraging the fast data load capabilities of the Oracle sql*loader utility.
When you need to bulk-load files that already exist in server storage without applying any transformations. The flow automatically loads data into staging tables and MERGEs data into the destination.
How it works
A typical bulk load ETL Flow performs the following operations:
It checks to see if the destination table exists, and if it does not, creates the table using metadata from the source.
The flow creates control file and executes Oracle sql*loaderutility to load data into the staging table in Oracle –– one staging table per actual destination table.
The Flow mergers data in the staging tables with data in the actual tables.
The sql*loader utility must be installed on the same VM as Etlworks. Contact Etlworks support support@etlworks.comif you need assistance installing the sql*loader.
Install and configure sql*loader
Read how to install and configure Oracle client which includes the sql*loader utility.
Step-by step process
Step 1. Create all required Connections
You will need a source Connection, a Server Storage Connection used as a stage for the file to load, and a Oracle connection. Oracle Connection will be used as a destination. We recommend using Oracle Service name connector.
To configure the final destination, click the Connections tab, and select the available Oracle Connection.
Step 5. Set the source and the destination
Depending upon the Flow type, you can select one of the following sources (FROM) for the Oracle Flow:
Database: use the database table or view name as the source (FROM) name. The wildcard names, for example,public.*are supported.
Well-known API: use the endpoint name or the file name as the source (FROM) name.
Web Service: use the endpoint name as the source (FROM) name.
File: use the source file name or a wildcard file name as the source (FROM) name.
Queue :use the queue name as the source (FROM) name.
For all Oracle ETL bulk load flows, the destination Connection is going to be a Server Storage Connection. Etlworks uses the destination Connection as a stage.
Select or enter the fully qualified or wildcard Oracle table name as a destination (TO). If the source (FROM) is entered using a wildcard template, set the destination (TO) to a wildcard as well.
Step 6. Set destination CSV Format configured in step 2
Step 7. Set the required and optional transformation parameters
Click Configure in the transformation row and select the Parameters tab.
Load parameters
Action: can beINSERT/APPEND/REPLACE/TRUNCATEwhich inserts records from the file(s) into the Oracle table,MERGEwhich merges records from the file with the records in the Oracle table, andCDC MERGEwhich loads CDC events (INSERT/UPDATE/DELETE) in the same order as they originated in the source database.MERGEandCDC MERGErequire configuring theLookup Fieldsor/and enabling the.
How to MERGE: the type of SQL to execute when merging data. The following options are available: DELETE/INSERT - DELETE all records in the actual table that also exist in the temp table, then INSERT all records from the temp table into the actual table; Native MERGE - execute native MERGE SQL.
User-defined MERGE SQL: this SQL is used to MERGE new records and updates into the existing table. The following parameters are automatically populated and can be referenced as {TOKEN} in the SQL:
•{TABLE}: The table to merge data into.
•{TEMP_TABLE}: The table to merge data from.
•{KEY_FIELDS}: The fields uniquely identifying the records in both tables.
•{FIELDS}: The fields to insert/update in the table to merge data into.
•{INSERT_FIELDS}: The values of the fields to insert.
•{UPDATE_FIELDS}: The fields and values to update in the format field=value,field=value.
•{UPDATE_CONDITIONS}: The condition for matching records in the actual and temp/staging tables.
If nothing is entered in this field, the default automatically generated MERGE SQL will be used.
Lookup Fields: the comma-separated list of fields that uniquely identify the record in the target Oracle table.
If this option is enabled andLookup Fieldsis empty, the system will try to predict the columns that uniquely identify the record in the target Oracle table.
Hardcoded sql*loader parameters
Content of the Control File: optional content of the sql*loader control file. By default, the Flow will generate a control file automatically. The following tokens can be used as a part of the control file:
{FILE_TO_LOAD}: the data file name
{TABLE}: the destination table name
{MODE}: the load mode (INSERT/APPENT/REPLACE/TRUNCATE)
{COLUMNS}: the destination table columns
{DELIMITER}: the column delimited in CSV file
{QUOTE}: the character used as quote in CSV file
UPDATE_COLUMNS: the columns to update
{DIRECT}: DIRECT or CONVENTIONAL load
{SKIP}: number of rows to skip (0 if file has no column header, otherwise 1)
{ERRORS}: number of errors to tolerate
Parameters for automatically generated sql*loader control file
Load Path: Direct or Conventional. Conventional Path loading uses the SQL INSERT statement to load data into the database. It processes data through the SQL engine and follows the usual data validation, trigger, and constraint mechanisms. Direct Path loading bypasses the SQL engine and loads data directly into the Oracle data files. This method is designed for high-speed data loading and is especially useful for large data sets.
Max # of Errors to Ignore: the maximum number of errors to ignore during the load.
NULLIF clause: In SQL*Loader, the NULLIF clause is used to specify that a field should be treated as NULL if it matches a certain condition. This is particularly useful for loading data where specific values (such as ‘null’ or empty strings) should be interpreted as NULL in the database. The format for this parameter is column=condition;column=condition. Example first_name = 'null';last_name = 'null'. Column names can be enclosed in double quotes.
ELT
Before LOAD SQL: this SQL will be executed on the Oracle Connection before loading data. You can configure Flow to ignore errors when executing this SQL and to execute SQL as a script.
Ignore errors when executing Before LOAD SQL: if this option is enabled, and there is an error when Before LOAD SQL is executed, the error will be ignored.
Before LOAD SQL is a script: if this option is enabled 'before LOAD SQL' will be executed as a script.
After LOAD SQL: this SQL will be executed on the Oracle Connection after loading data. You can configure Flow to ignore errors when executing this SQL and to execute SQL as a script.
Ignore errors when executing After LOAD SQL: if this option is enabled, and there is an error when After LOAD SQL is executed, the error will be ignored.
After LOAD SQL is a script: if this option is enabled 'after LOAD SQL' will be executed as a script.
Parameters specific for different source types
Depending on the Flow type, other Flow parameters can be added, as explained below:
Ignore Transformations: if this parameter is enabled, the system will copy files directly into the server storage before loading data into the actual table. This greatly improves the performance of the load but automatically disables any transformations.
Delete loaded source files: if this parameter is enabled, the system will automatically delete loaded files from the source.
Delete source files on error: if this parameter is enabled and there is an error during the load, the system will automatically delete loaded files from the source.
Step 8. Configure how to handle source and destination schema changes
It is typical when the source (for example, the table in the OLTP database) and the destination (Oracle table) have a different number of columns. It is also entirely possible that the order of columns in the source is different from that in the destination.
When setting the source-to-destination transformation, it is possible to configure it to automatically handle schema changes. The following configuration options are available:
Reorder CSV Columns During Extract: if this option is enabled (it is disabled by default), the Flow reorders columns in the data file to match the order of columns in the target table. This option is ignored if the target table does not exist.
Ignore fields in the source that do not exist in the target table:if this option is enabled (it is disabled by default) and the source has more fields than the target table, the system will automatically exclude extra fields. Options 2, 3, and 4 are mutually exclusive.
Alter target table if the source has columns that the target table doesn't have: if this option is enabled (it is disabled by default) and the source has different fields than the target table, the system will add extra fields to the target table. Options 2, 3, and 4 are mutually exclusive.
Recreate target table if the source has columns that the target table doesn't have: if this option is enabled (it is disabled by default) and the source has different fields than the target table, the system will automatically drop and recreate the target table. Options 2, 3, and 4 are mutually exclusive.
Insert null into the fields in the target table that do not exist in the source: if this option is enabled (it is disabled by default) and the target table has different fields than the source, the system will automatically insert NULL values into these fields in the target.
Create table SQL: by default system automatically generates an SQL for creating a new table if it does not exist. You can override the default SQL by providing a template or a JavaScript code. Read more about how to override the create table SQL.
Configure notifications if source and destination have different columns
It is possible to send an email notification if the source and destination have different columns. To configure Flow to send a notification when either source has more columns than the destination or the destination has more columns than the source, use the technique explained in this article.
The only difference is that you don't need to configure the exception handler. If the system detects that the source and the destination have a different number of columns, it will add an exception to the log, which can be retrieved and sent in the email body, as explained above.
Step 9. Optionally configure Mapping
If necessary, you can create a Mapping between the source and destination (Oracle) fields.
You create the Mapping between the source and destination just like you usually do for any other Flow type.
When change replication is enabled, only the changed records will be loaded into Oracle.
Basically, all you need to do is set the High Watermark Field and Enable Change Replication for the transformation. Additionally, for better reliability, you can set the calculated High Watermark Field Value.
Load multiple tables by a wildcard name
To load multiple database objects (tables and views) by a wildcard name (without creating individual source-to-destination transformations), follow the same steps as above,except:
FROM
Create a single source-to-destination transformation and set the FROMto afully qualified source database object name with wildcard characters (* and ?), for examplepublic.*
By default, all tables and all views that match the wildcard name in FROM will be included. If you want to exclude specific database objects, enter a comma-separated list of the database objects to exclude in the Exclude Objectsfield. If you want to include only specific database objects, enter a comma-separated list of the database objects to include in the Include Objects.
TO
Set the TO to the db.schema.*, for example, dbo.history.*.
Optionally, configure the Source query. You can use token {TABLE} in the Source query.
BEFORE/AFTER SQL
If you want to execute any code before and/or after LOAD, use token {TABLE}in the Before LOAD SQLand After LOAD SQL.
MERGE
If you are configuring MERGE action,do notenter anything in the Lookup Fields. Enable Predict Lookup Fields instead.
An alternative to enabling the Predict Lookup Fields option (which is not always accurate), is by specifying the list of table=fields pairs in the Lookup Fields. Use the fully-qualified table names and ; as a separator between table=fields pairs.
If you are setting up the high watermark change replication with a calculated HWM field, use token{TABLE}in the High Watermark Field Value.
Flow variables
To add user-defined Flow variables open Parameters tab, add variables as key-value pairs. Use uppercase characters and no spaces for variable names.
Maximum number of parallel threads
It is possible to execute multiple source-to-destination transformations (extract-load) inparallel threads.
By default, Etlworks limits the number of threads to 5. To change the limit open Parameters tab, set the max number of threads in Maximum Number of Parallel Threads. This value cannot be greater than 99.
Comments
0 comments
Please sign in to leave a comment.