Overview
A consulting company was hired by a government agency to develop a data intelligence solution based on the Tableau platform. There are multiple sources of the data, from locally hosted SQL Server database to Google Analytics, Facebook, and third-party web services. The Tableau includes basic data integration capabilities, but in this particular case, it's not enough to build a data ingestion pipeline that can load data from all the sources.
Requirements
- The solution must be able to connect to all data sources, including MS SQL Server database, Google Analytics, Facebook, and third-party web services.
- The solution must be able to transform extracted data into the Format understood by the Tableau.
- The solution must be able to expose transformed data as REST API endpoints which can be called from the Tableau on demand.
- There will be multiple Flows (one per data item) extracting (combined) a few million records every day.
Solution
The solution was developed in Etlworks using a feature called user-defined PULL API. In this tutorial, we will cover two data items, exposed as API endpoints:
A Flow which exposes a table in SQL Server database as an API endpoint
Step 1. Create a database Connection to the SQL Server database. In this particular case, since the SQL server is running behind the firewall, the IP address of the Etlworks instance was whitelisted so the SQL server could accept the inbound requests from Etlworks.
Step 2. Create an HTTP Listener with the following parameters:
URL Pattern
:/data/environmental
. Read about URL Patterns.Method
:GET
.Auth Type
:Basic
.
Step 3. Create a CSV Format with a Delimiter set to pipe (|
).
Step 4. Create a new Flow by selecting the Database to Web Service
in the gallery.
Step 5. Create a new source-to-destination-transformation with the following parameters:
- Connection (
FROM
): a database Connection created in Step 1. From
:EnvironmentalImpact
(a name of the source table).To
:environmental.csv
.- Connection (
TO
): an HTTP Listener created in Step 2. - Format (
TO
): a CSV Format created in Step 3.
Step 6. Click MAPPING
and set the Source query
, if needed.
Step 7. Save the Flow and schedule it as an event-driven Flow.
Calling Flow from Tableau
The Flow can now be called from the Tableau (or any other third-party application with a capability to call API endpoints) using the following parameters:
- URL:
https://etlworks-host/plugins/schedules/rest/v1/httplistener/data/environmental
. - HTTP Method:
GET
. - Authentication:
Basic
. - Username and password: username and password of any Etlworks user who can run the Flow or the API user.
A Flow which exposes an extract from Google Analytics as an API endpoint
Step 1. Create a Google Analytics Connection.
Step 2. Create an HTTP Listener with the following parameters:
URL Pattern
:/data/GoogleAPI
. Read about URL patterns.Method
:GET
.Auth Type
:Basic
.
Step 3. Create a JSON Format with all default settings.
Step 4. Create a new Flow by selecting Google Analytics to
Web Service
in the gallery.
Step 5. Create a new source-to-destination-transformation with the following parameters:
- Connection (
FROM
): a Google Analytics Connection created in Step 1. From
:report
.To
:report.json
.- Connection (
TO
): an HTTP Listener created in Step 2. - Format (
TO
): a JSON Format created in Step 3.
Step 6. Save the Flow and schedule it as an event-driven Flow.
Calling Flow from Tableau
The Flow can now be called from the Tableau (or any other third-party application with a capability to call API endpoints) using the following parameters:
- URL:
https://etlworks-host/plugins/schedules/rest/v1/httplistener/data/GoogleAPI
. - HTTP Method:
GET
. - Authentication:
Basic
. - Username and password: username and password of any Etlworks user who can run the Flow or the API user.
Comments
0 comments
Please sign in to leave a comment.