Problem
A marketing company needs to collect XML documents from various sources (web services, SFTP, etc.), transform them to the unified XML Format, and send them as a payload to the web service which loads data into the relational database. Some of the XMLs are up to 1Gb in size which is too big for a web service to handle so the large XMLs need to be split into smaller files before sending them to the web service.
Requirements
- The solutions must be able to copy files from the various sources into the server storage.
- The solution must be able to transform XML documents in the server storage using XSLT.
- The solution must be able to split large XML documents into smaller files.
- The solution must be able to send XML document as a payload to the web service.
- The flow will be executed a few times a day.
Solution
The solution developed by the team uses built-in capabilities of the Etlworks Integrator to transform XML using XSLT and split XML.
The Flow which copies XML to the local storage
In this tutorial, we will be creating a Flow that copies files from the FTP to the server storage.
Step 1. Create an FTP Connection.
Step 2. Create a server storage Connection.
Step 3. Create a Copy Files or Move Files Flow (depending on whether you wish to delete the files in the source).
Step 4. Add a new transformation with the following parameters:
Connection (from)
: a Connection created in step 1.From
: a filename or a wildcard filename to copy or move.To
: * (same names as in the source) or a new destination file name.Connection (to)
: a Connection created in step 2.
The Flow which transforms XML using XSLT
Step 1. Create a new Apply XSL style sheet to XML files Flow.
Step 2. Add a new transformation with the following parameters:
Connection (from)
: a server storage Connection created in the previous section.From
: a filename or a wildcard filename of the file(s) to transform using XSLT.To
: leave it empty or enter a new destination filename.Connection (to)
: a server storage Connection created in the previous section. Alternatively, you can use a different server storage Connection. For example, pointed to the different folder.
Step 3. Click MAPPING
, select Parameters
, and enter the XSLT.
The Flow which splits XML
The Etlworks Integrator can split XML files if they have repeatable segments. Read about splitting XML files.
Step 1. Create a new Split XML files Flow.
Step 2. Add a new transformation with the following parameters:
Connection (from)
: the server storage Connection used as a Connection (to) in the Flow above.From
: the filename or a wildcard filename of the XML file(s) to split.Connection (to)
: the server storage Connection used as a Connection (to) in the Flow above.
Step 3. Click MAPPING
, select Parameters
, and define the Paths for the repeating segments in XML
.
Read about splitting XML files.
The Flow which sends the XML to the HTTP endpoint
Step 1. Create a destination HTTP Connection.
Step 2. Create a new Move Files Flow.
Step 3. Add a new transformation with the following parameters:
Connection (from)
: a Connection, used as a Connection (to) in the Flow above.From
: a filename or a wildcard filename of the XML file(s) to be sent as a payload to the web service.Connection (to)
: an HTTP connection created in step 1.
The Flow which cleans up files in the server storage
The previous Flows are creating the files in the local storage so it is a good idea to delete all the files after processing.
Step 1. Create a new Delete Files Flow.
Step 2. Add as many independent transformations as a needed (one for each combination of a folder and a wildcard filename). Each transformation will have:
Connection (from)
: a Connection that points to the folder with files to delete.From
: a filename or a wildcard filename to delete.
All together now
Step 1. Add a new nested Flow.
Step 2. Add the Flows created in the previous sections in the following order:
Step 3. Save the Flow and schedule it to run a few times a day.
Comments
0 comments
Please sign in to leave a comment.