In Etlworks Integrator, you can use SQL when extracting data from files, API endpoints, social websites, as well as (obviously) relational databases.
When creating a source-destination transformation, enter a query into the Source query field on the Mapping tab.
Writing SQL queries when working with relational databases
Use native SQL syntax of the source relational database. In addition to SELECT SQL statements, you can execute stored procedures and functions as long as they return a cursor.
Writing SQL queries when working with files and API endpoints
data streaming will be automatically disabled if you use SQL queries together with data objects (as opposed to relational databases). Therefore, Integrator must read the whole dataset into memory before it can execute the query.
Writing SQL queries which will be executed on a dataset of any type is not much different from writing queries which will be executed on a relational database:
- Each query starts with the select keyword, followed by a comma-separated list of fields to include in the query.
- It is possible to select all fields by using
*
, as inselect *
. - Fields with spaces must be surrounded by double quotes:
"first name"
. - It is possible to use aliases for fields, such as
"first name" first_name
- It is possible to use mathematical expressions to calculate field values, such as
salary + bonus compensation
. - Typically there is no from (expect "from" to be used when selecting data from a nested data object). The following query will work just fine:
select first, last
. - String literals must be enclosed in single quotes:
'john'
. - Use the where condition for filtering out records in a dataset:
select * where salary > 50000
. - Use
field != null
to compare a field value with null. - It is possible to use JavaScript functions in expressions, for calculating field values and in the where clause:
select * where first_name.toLowerCase().trim() = 'john'
- It is possible to use aggregation functions, such as SUM, AVG, MIN, MAX, COUNT to calculate field values. Using aggregation functions requires using the group by expression.
- Use order by fields for ordering datasets. The
asc and desc
modifiers are allowed:select * where salary > 50000 order by salary desc
. - It is important to understand an order of operations:
- First, the where clause conditions are applied.
- Then group by is applied.
- Then fields will be calculated.
- Finally, order by is applied.
Using aggregation functions
Suppose that the source data object is an Excel file, containing order items:
The goal is to read a source Excel file and load it into two database tables:
- Order -
OrderNum
,OrderDate
,OrderAmount
. WhereAmount
is calculated as aSUM(amount)
for all items in the order. - Item -
OrderNum
,OrderDate
,Supplier
,SKU
,Amount
.
To split a single source dataset into two, we will be using the technique described here.
To create the first transformation, which populates the Order table, simply write a Source SQL query which includes an aggregation function, as in the example below:
select "Order Number" OrderNum, "Order Date" as OrderDate, sum(amount) OrderAmount
group by "Order Number"
we are using the group by expression together with SUM(amount).
The following aggregation functions are supported:
SUM(field)
- sum of all aggregated values of the field.MIN(field)
- minimum value of the field.MAX(field)
- maximum value of the field.AVG(field)
- average value of the field.COUNT(*)
- number of records.
The group by expression can include one or more comma-separated field name(s) to group the dataset.
expressions are not supported as parameters for aggregation functions.
Using set operations
The following set operations are supported when writing SQL extracting data from files, API endpoints, and social websites:
- UNION - the UNION operator is used to combine the result sets of 2 or more SELECT statements. Unlike UNION operator used when running a query against relational databases, it does not remove duplicate rows between the various SELECT statements, so it works similarly to UNION ALL.
- INTERSECT - the INTERSECT operator is used to return the results of 2 or more SELECT statements. However, it only returns the rows selected by all queries or data sets. If a record exists in one query and not in the other, it will be omitted from the INTERSECT results.
- MINUS - return all rows in the first SELECT statement that are not returned by the second SELECT statement.
Writing queries for nested data objects.
This is an example where the source is a nested data object returned by a web service.
{
"requestId":"185d6#14b51985ff0",
"success":true,
"result":[
{
"id":"433455454",
"name":"all fields",
"searchableFields":[
{
"name":"field1",
"displayName":"Just field 1",
"dataType":"string",
"length":36,
"updateable":false
},
{
"name":"field2",
"displayName":"Just field 2",
"dataType":"string",
"length":50,
"updateable":false
},
],
"fields":[
{
"name":"leadId",
"displayName":"Lead Id",
"dataType":"integer",
"updateable":false
},
{
"name":"role",
"displayName":"Role",
"dataType":"string",
"length":50,
"updateable":false
},
{
"name":"isPrimary",
"displayName":"Is Primary",
"dataType":"boolean",
"updateable":true
},
{
"name":"externalCreatedDate",
"displayName":"External Created Date",
"dataType":"datetime",
"updateable":true
}
]
}
]
}
The goal is to extract all searchableFields
and fields
from the nested data object and create a flat CSV file.
This can easily be accomplished by using the following SQL query, which is applied to the nested source data object:
select name, displayName, 'searchable' as fieldType from result.searchableFields union select name, displayName, 'regular' as fieldType from result.fields
The additional from clause is of primary importance. When writing a query to be executed on a nested data object, you can specify a level from which the data will be extracted. Use .
as a delimiter to drill down into the nested hierarchy of objects.
If you want to include some of the fields from the parent
object, as well as fields from the nested object
, use the owner.
qualifier:
select result.name as apiName, searchableFields.name, searchableFields.displayName,
'searchable' as fieldType
from result.searchableFields
union
select result.name as apiName, fields.name, fields.displayName,
'regular' as fieldType
from result.fields
Comments
0 comments
Please sign in to leave a comment.