Overview
Every day, logistics companies receive Excel spreadsheets from their vendors. The typical spreadsheet has an account number, other fields, and the invoiced numbers that are spread across the rolling dates.
The rolling dates get shifted to the right the next day:
Mon Nov 1 10:21:38 -> Tus Nov 2 10:21:38
Tus Nov 2 10:21:38 -> Wen Nov 3 10:21:38
...
etc.
There are always seven rolling dates (a week).
The worksheet needs to be transformed into the normalized Format with 3 columns: ACCOUNT NO
, Period
, and Value
, where the Period
is a header of the rolling date column and the Value
is a value of the cell:
The normalized dataset needs to be stored in a different Excel file.
Solution
The customer creates two Flows: the first one loads the source Excel into the temporary staging table and renames columns so they always have the same names. The second Flow uses SQL to create a normalized view from the de-normalized database table, created by the first Flow, and stores it in the Excel worksheet.
Create a Flow which loads denormalized Excel into the temporary database table
Step 1. Create an SFTP Connection for the source and destination Excel files.
Step 2. Create an Excel XLSX Format and enter 1
in the Worksheet Name or 1-based index
field. We are always going to be loading data from the first worksheet.
Step 3. Create a Connection to the temporary database.
Step 4. Create a new Flow by selecting the Flow type File to database
from the gallery.
Step 5. Add a new source-to-destination transformation where:
CONNECTION
(FROM
): is a Connection created in Step 1.FORMAT
(FROM
): is a Format created in Step 2.FROM
: is a source file name or a wildcard file name, for example,invoice*.xlsx
.CONNECTION
(TO
): is a Connection created in Step 3.TO
: is the name of the temporary table.
Step 6. Click MAPPING
and enter the following Source query
:
SELECT "ACCOUNT NO" AS account_no,
com.toolsverse.util.DateUtil.parse(dataSet.getFieldDef(26).getName()) AS p1_date,
dataSet.getFieldValue(currentRecord, 26) AS p1_value,
com.toolsverse.util.DateUtil.parse(dataSet.getFieldDef(27).getName()) AS p2_date,
dataSet.getFieldValue(currentRecord, 27) AS p2_value,
com.toolsverse.util.DateUtil.parse(dataSet.getFieldDef(28).getName()) AS p3_date,
dataSet.getFieldValue(currentRecord, 28) AS p3_value,
com.toolsverse.util.DateUtil.parse(dataSet.getFieldDef(29).getName()) AS p4_date,
dataSet.getFieldValue(currentRecord, 29) AS p4_value,
com.toolsverse.util.DateUtil.parse(dataSet.getFieldDef(30).getName()) AS p5_date,
dataSet.getFieldValue(currentRecord, 30) AS p5_value,
com.toolsverse.util.DateUtil.parse(dataSet.getFieldDef(31).getName()) AS p6_date,
dataSet.getFieldValue(currentRecord, 31) AS p6_value,
com.toolsverse.util.DateUtil.parse(dataSet.getFieldDef(32).getName()) AS p7_date,
dataSet.getFieldValue(currentRecord, 32) AS p7_value
where "ACCOUNT NO" <> null
the rolling dates
fields are addressed by the index and renamed to the pn_date
:
com.toolsverse.util.DateUtil.parse(dataSet.getFieldDef(26).getName()) AS p1_date
Step 7. Name and save the Flow.
Create a Flow which creates a normalized Excel spreadsheet
Step 1. Create a new Flow by selecting a Flow type Database to file
from the gallery.
Step 2. Add a new source-to-destination transformation where:
CONNECTION
(FROM
): is a Connection to the temporary database.FROM
: is the name of the temporary table created by the previous Flow.CONNECTION
(TO
): is an SFTP Connection.FORMAT
(TO
): is an Excel Format.TO
: is the name of the destination (transformed and normalized) Excel file.
Step 3. Click MAPPING
and enter the following Source query
:
select account_no,
p1_date period,
p1_value value
from payments_staging
union all
select account_no,
p2_date period,
p2_value value
from payments_staging
union all
select account_no,
p3_date period,
p3_value value
from payments_staging
union all
select account_no,
p4_date period,
p4_value value
from payments_staging
union all
select account_no,
p5_date period,
p5_value value
from payments_staging
union all
select account_no,
p6_date period,
p6_value value
from payments_staging
union all
select account_no,
p7_date period,
p7_value value
from payments_staging
union all
is used to normalize the dataset.
select account_no,
p1_date period,
p1_value value
from payments_staging
...
union all
select account_no,
p2_date period,
p2_value value
from payments_staging
Step 4. Name and save the Flow.
Create a Flow which combines both Flows
Step 1. Add a new nested Flow.
Step 2. Add Flows created in the previous steps to the nested Flow in the following order:
- The Flow which loads data from the source Excel into the temporary table.
- The Flow which loads data from the temporary table into normalized destination Excel.
Step 3. Save and Schedule the Flow to run once a day.
Comments
0 comments
Please sign in to leave a comment.