Overview
Lookups are a fundamental part of ETL operations. They allow you to retrieve specific values or entire datasets based on input parameters.
For example, if you have a user_id, a lookup can return all associated phone numbers for that user by querying a database or another data source.
In Etlworks, you can perform lookups in multiple ways:
- Extract and Lookup – Query a data source and retrieve values dynamically.
- Lookup Using SQL on an Existing Dataset – Perform lookups on pre-loaded datasets using SQL.
- Lookup Using Filters on an Existing Dataset – Apply filter conditions to in-memory datasets for faster lookups.
Extract and Lookup
Use this method if you need to extract data from a database, apply SQL, and return either a specific field’s value or an entire dataset.
When to Use?
✅ Your lookup data is stored in a relational database.
✅ You need to retrieve data dynamically while processing records.
✅ The lookup results are used in calculations or data enrichment.
Example Scenario
Imagine two tables in different databases:
- movies (contains director_id)
- directors (stores director details like first_name, last_name)
Your goal is to combine these tables and load them into a single table in a data warehouse.
How to Implement?
Step 1. Use movies as the source table.
Step 2. Add calculated fields (e.g., director_first_name, director_last_name).
Step 3. Use a lookup method to retrieve the director’s details from the directors table.
Lookup Methods Available
- Calculate the Field’s Value Using SQL: Directly apply SQL to calculate a field’s value.
- Lookup Returning a Field’s Value: Retrieve a single field from a lookup.
- Lookup Returning a Dataset: Fetch multiple rows matching the lookup criteria.
Calculate the Field’s Value Using SQL
This approach allows you to dynamically calculate a field’s value using SQL.
Looking Up Data using Source connection
If each record contains a director_id, use SQL to fetch the corresponding first name:
Step 1. Add a Calculated Field named first_name.
Step 2. Enter SQL Query in the Field Value Function:
SELECT first_name director_first_name FROM directors WHERE id = {director_id}
🔹 {director_id} is dynamically replaced with the value of the column director_id in the source dataset.
Looking Up Data using Destination connection
If the lookup data is in the destination connection, use prefix destination:
destination.SELECT first_name director_first_name FROM directors WHERE id = {director_id}
Looking Up Data using connection other than Source and Destination
If the lookup data is in connection other then source and destination, use prefix connection name:
temp.SELECT first_name director_first_name FROM directors WHERE id = {director_id}
Lookup Returning a Field’s Value
This method extracts data from a specific connection, runs a query, and returns a single field’s value.
Example: Fetch Director’s First Name from a Different Database
value = com.toolsverse.etl.core.engine.Extractor.lookup(
etlConfig, scenario,
"connection_for_directors_db", /* name of the connection for lookup */
"directors", /* name of the dataset, cab be any */
"first_name", /* name of the column */
"SELECT first_name FROM directors WHERE id = {ID}", /* sql, can be null */
new com.toolsverse.util.TypedKeyValue("ID",
dataSet.getFieldValue(currentRow, 'director_id') /* optional parameters */
)
);
🔹 {ID} is dynamically replaced with the value of the column director_id in the source dataset.
Notes
- SQL can be null
- You can pass any numbers of parameters using new com.toolsverse.util.TypedKeyValue(key, value), including no parameters at all
Handling Cached and Non-Cached Lookups which return Field's value
- Use Extractor.lookupNoCache() if data changes between calls.
- Use Extractor.lookup() for performance optimization with cached results.
Using Callable SQL with lookup which returns Field's value
A callable SQL statement for executing blocks of code refers to executing multi-statement procedural logic directly in the database. These blocks typically support control flow constructs (e.g., loops, conditionals), variable declarations, error handling, and dynamic SQL execution.
- Use: Extractor.lookupWithCallable() for executing callable SQL statements during lookup.
Example: Lookup with callable SQL statement
var sql = 'begin\n' +
' select country from country where country_id = {country_id}\n' +
'end';
value = com.toolsverse.etl.core.engine.Extractor.lookupWithCallable(
etlConfig, scenario,
'SQL Server',
'country',
'country',
sql,
new com.toolsverse.util.TypedKeyValue('country_id',
dataSet.getFieldValue(currentRow, 'country_id')));
Lookup Returning a Dataset
This method retrieves an entire dataset instead of just one field.
Example: Fetch All Images Linked to a Property
var images = com.toolsverse.etl.core.engine.Extractor.lookup(
etlConfig, scenario,
"connection for lookup", /* name of the connection for lookup */
"image", /* name of the dataset, can be any */
"SELECT filename AS url FROM property_pictures WHERE id = {ID}", /* sql, can be null */
new com.toolsverse.util.TypedKeyValue("ID", dataSet.getFieldValue(currentRow,
'picture_id') /* optional parameters */)
);
🔹 {ID} is dynamically replaced with the value of the column picture_id in the source dataset.
Notes
- SQL can be null
- You can pass any numbers of parameters using new com.toolsverse.util.TypedKeyValue(key, value), including no parameters at all
Handling Cached and Non-Cached Lookups which return entire Dataset
- Use Extractor.lookupNoCache() if data changes between calls.
- Use Extractor.lookup() for performance optimization with cached results.
Using Callable SQL with lookup which returns entire Dataset
A callable SQL statement for executing blocks of code refers to executing multi-statement procedural logic directly in the database. These blocks typically support control flow constructs (e.g., loops, conditionals), variable declarations, error handling, and dynamic SQL execution.
- Use:Extractor.lookupWithCallable() for executing callable SQL statements during lookup.
Example: Lookup with callable SQL statement
var sql = 'begin\n' +
' select * from country where country_id = {country_id}\n' +
'end';
var country = com.toolsverse.etl.core.engine.Extractor.lookupWithCallable(
etlConfig, scenario,
'SQL Server',
'country',
sql,
new com.toolsverse.util.TypedKeyValue('country_id',
dataSet.getFieldValue(currentRow, 'country_id')));
Lookup Using SQL on an Existing Dataset
Use this method to run SQL queries on an in-memory dataset, instead of querying an external database.
When to Use?
✅ Lookup data stored in a nested JSON, XML, or API response.
✅ Improve performance by avoiding repeated external queries.
Example: Lookup in a JSON Dataset which returns a Field's value
value = com.toolsverse.etl.common.CommonEtlUtils.getFieldValue(
scenario.getSources().get("DIRECTORS").getDataSet(), /* dataset to lookup in */
"select last_name where id = {ID}", /* SQL, can be null */
"last_name", /* field's name */
new com.toolsverse.util.TypedKeyValue("ID", /* optional parameters */
dataSet.getFieldValue(currentRow, 'director_id')));
🔹 {ID} is dynamically replaced with the value of the column director_id in the source dataset.
Notes
- SQL can be null
- You can pass any numbers of parameters using new com.toolsverse.util.TypedKeyValue(key, value), including no parameters at all
Example: Lookup in a JSON Dataset which returns entire dataset
var phones = com.toolsverse.etl.common.CommonEtlUtils.executeSql(
scenario.getSources().get("PHONES").getDataSet(), /* dataset to lookup in */
"select * where area_code=412" /* SQL, can be null */
);
Lookup Using Filters on an Existing Dataset
Instead of using SQL, this method applies filter conditions to an in-memory dataset.
When to Use?
✅ Lookup data stored in CSV or Excel files.
✅ Faster than SQL-based lookups but works only on flat datasets.
Example: Lookup in a CSV Dataset which returns Field's value
value = com.toolsverse.etl.common.CommonEtlUtils.lookup(
scenario.getSources().get("DIRECTORS").getDataSet(), /* dataset to lookup in */
"id=" + {director_id}, /* filter */
"last_name" /* field's name */
);
🔹 {director_id} is replaced dynamically at runtime with the value of the column director_id in the source dataset.
Example: Lookup in a CSV to Retrieve All Phone Numbers for an Area Code
var phones = com.toolsverse.etl.common.CommonEtlUtils.filter(
scenario.getSources().get("PHONES").getDataSet(), /* dataset to lookup in */
"area_code=412" /* filter */
);
Comments
0 comments
Please sign in to leave a comment.