Overview
A SaaS startup uses Atlassian Jira to track development tasks, sprints, bugs, and change requests. The DI team developed a data warehouse that includes data loaded (using Etlworks) from the vendors and various production systems. At some point, the team decided that they need to load data from Jira as well, so they can have a single source of truth for all sorts of reports, dashboards, etc.
Requirements
- The ETL Flow must be able to extract data from the Jira REST API.
- The Flow must be able to parse the response (a nested JSON document), transform it into the flat dataset, and load it into the PostgreSQL database, which is used for the data warehouse.
- The Flow must be able to do a full refresh of the Jira-related tables in the data warehouse.
- The Flow is executed every hour, extracting and loading a few thousand records on each run.
Solution
The actual Flow developed by the team extracts and loads various types of information, such as:
- Issues in a current Sprint
- Issues in Eptics
- Known bugs
- Issues in the latest release
- Issues in the next Sprint
- Recently resolved bugs
- All releases
- All Sprints in the release
In this tutorial, we will only cover a single use-case: loading issues in a current Sprint. The others are very similar, and they use the exact same techniques.
Step 1. Create an HTTP Connection to the Jira API. Since we are extracting data from the current Sprint, the URL will look like the following:
https://company.atlassian.net/rest/agile/latest/sprint/{sprint_id_current_sprint}/issue
Notice the company
, which is part of the base URL, and the token {sprint_id_current_sprint}
. We will be setting this token dynamically.
Jira API uses basic authentication so:
- set the
basic preemptive
as a value for theAuthentication
parameter. - set the
User
andPassword
to the valid Jira username and password.
Step 2. Create a JSON Format with all default settings.
Step 3. Create a database Connection to the PostgreSQL database.
Step 4. Create a new JavaScript Flow. We will be using this Flow to set the token {sprint_id_current_sprint}
. Enter the following code:
var javaImports = new JavaImporter(com.toolsverse.
config,com.toolsverse.etl.core.engine, com.toolsverse.util); with (javaImports) {
// reads the last loaded sprint_id from the data warehouse
var metric = Extractor.lookup(etlConfig, scenario, "DW", "metric",
"select sprint_id from jira.sprints where state = 'active' limit 1");
var sprint_id_current_sprint = null;
if (metric != null && metric.getRecordCount() > 0 ) {
var id = metric.getFieldValue(0, "metric");
if (!Utils.isNothing(id)) {
sprint_id_current_sprint = new java.lang.String(id);
}
} else {
sprint_id_current_sprint = null;
}
if (Utils.isNothing(sprint_id_current_sprint)) {
sprint_id_current_sprint = null;
}
etlConfig.log("sprint_id_current_sprint: = " + sprint_id_current_sprint);
SystemConfig.instance().getProperties().put("sprint_id_current_sprint",
sprint_id_current_sprint != null ? sprint_id_current_sprint :
new java.lang.String("-1"));
}
This code makes a database call to extract the current Sprint stored in the database and set the value of the global variable sprint_id_current_sprint
. The current sprint is getting stored in a database by executing the Flow Get all Sprints in release
, which, as described before, is not in this tutorial but is very similar to all other Jira Flows.
Step 5. Create a new Flow by selecting the Web Service to the database
in the gallery.
Step 6. Add a new source-to-destination transformation and set the following parameters:
- Connection (
FROM
): the HTTP Connection created in step 1. - Format (
FROM
): the JSON Format created in step 2. FROM
:issues
.TO
:jira.issues_current_sprint
, which is a fully qualified name of the destination table.- Connection (
TO
): the database Connection created in step 3.
Step 7. Click MAPPING
and type the following Source query
:
select key,
com.toolsverse.util.Utils.str2Date(
dataSet.getFieldValueWithNested(currentRecord, 'created.name').getValue(), null,
"yyyy-MM-dd'T'HH:mm:ss.SSSz")
issue_created_date,
com.toolsverse.util.Utils.str2Date(
dataSet.getFieldValueWithNested(currentRecord, 'updated.name').getValue(), null,
"yyyy-MM-dd'T'HH:mm:ss.SSSz")
issue_updated_date,
dataSet.getFieldValueWithNested(currentRecord, 'issuetype.name').getValue()
issue_type,
dataSet.getFieldValueWithNested(currentRecord, 'priority.name').getValue()
issue_priority,
dataSet.getFieldValueWithNested(currentRecord, 'status.name').getValue()
issue_status,
dataSet.getFieldValueWithNested(currentRecord, 'summary.name').getValue()
issue_summary,
dataSet.getFieldValueWithNested(currentRecord, 'labels.name').getValue()
issue_labels,
dataSet.getFieldValueWithNested(currentRecord, 'reporter.name').getValue()
issue_reporter,
dataSet.getFieldValueWithNested(currentRecord, 'assignee.name').getValue()
issue_assignee,
dataSet.getFieldValueWithNested(currentRecord, 'creator.name').getValue()
issue_creator,
com.toolsverse.util.Utils.str2Date(
dataSet.getFieldValueWithNested(currentRecord, 'resolutiondate.name').getValue(), null,
"yyyy-MM-dd'T'HH:mm:ss.SSSz")
issue_resolution_date,
dataSet.getFieldValueWithNested(currentRecord, 'customfield_10016.name').getValue()
issue_epic
from issues.fields
Most likely, your actual SQL will be different since this one uses custom fields, but it demonstrates very well how to transform the nested JSON document returned by the Jira API and convert it to the flat dataset, which can be loaded into the database table.
Step 8. If needed, change the Mapping.
Step 9. Configure the SQL which will be executed on a destination Connection before running the transformation. Go to Parameters/ Before SQL
and enter the following SQL:
truncate table jira.issues_current_sprint
Step 10. Save the Flow.
Step 11. Combine the JavaScript Flow created in step 4 and the transformation Flow created in Steps 5 to 10 in a nested Flow by adding Flows in the following order:
- JavaScript Flow.
- Transformation Flow.
Step 12. Save and Schedule the Flow to run every hour.
Comments
0 comments
Please sign in to leave a comment.