Overview

Paginated APIs are commonly used to retrieve large datasets in smaller chunks, improving performance and usability. While Etlworks now includes built-in support for various pagination methods (such as offset-based, page-based, cursor-based, next-link, and time-based), some APIs may require more advanced or custom logic to handle pagination correctly.

This article describes a script-based technique for working with paginated APIs, which involves looping through requests using variables and conditions in a flow. This method provides full control over how requests are constructed and how pagination is handled between calls.

Note: This scripting technique should be considered only when built-in pagination support does not work for a particular API or use case. For the recommended and simpler approach see:

Using Built-In Pagination with the HTTP Connector

Steps for Working with Paginated APIs using Scripts

Step 1. Create an API Connection:

• Go to Connections > Add connection > HTTP.

• Set the base URL and any required authentication parameters.

• Use variables like {offset}, {page}, or {cursor} to dynamically update API requests during pagination.

Step 2. Extract Response Metadata:

• Use Postprocessor in HTTP connection to parse API responses and extract metadata (e.g., next_cursor, total_pages, or next_page_token).

• Use etlConfig.setRequestStop(true, 'flow name') to stop the loop programmatically from the Preprocessor based on conditions. The 'flow name' is the name of the Flow executing inside the loop that drives pagination. etlConfig.setRequestStop(true) without a flow name is still available for backward compatibility, but it will stop all loops, including parallel and nested loops. Once set, etlConfig.setRequestStop(true) must be manually reset, as it does not reset automatically (not required for using named loops).

Step 3. Create ETL flow.

• Use connection created in step 1 as a source.

Step 4. Set Up Loop:

• Create Nested Flow.

• Define Flow Variables as key-value pairs. The keys must be the same as {variables} in the HTTP connection URL. Set starting values.

• Add ETL flow created in step 3 to the nested flow and configure Script Loop to iterate through pages until the end condition is met.

Step 5. Handle Rate Limiting:

• Use Throttling parameters or configure retries if the API enforces rate limits.

Pagination Methods

Etlworks supports almost any pagination method. The following are the most common:

Offset-Based Pagination

Description: Uses an offset parameter to indicate where the page starts and a limit parameter to define the number of results per page.

The other name of this pagination technique is Page-based.

Common Parameters:

• page: The index of the first record to retrieve (e.g., 1 for the first page). The other common name for this parameter is offset.

• size: The maximum number of records to retrieve. The other common names for this parameter are pageSize and limit.

Example:

GET /api/items?page=1&size=10
GET /api/items?page=2&size=10

Stop conditions:

• The loop stops when there is no more data to fetch (e.g., an empty response or no data node in the API response), or

• When the loop reaches the total number of iterations required to fetch all the data, based on the total number of records provided in the metadata.

Offset-Based Pagination: Implementation

Etlworks provides flexibility for implementing Offset-Based Pagination by handling two common stop conditions:

Case 1: Stop when there is no more data to fetch.

Case 2: Stop when the loop reaches the total number of iterations based on the total number of records.

Below are step-by-step instructions for implementing both cases.

Step 1: Create an HTTP Connection

• Go to Connections > Add Connection > HTTP.

2. Set Up the Connection:

• URL: Enter the API’s base URL (e.g., https://api.example.com).

• Use variables for pagination parameters, such as {page} and {size}:

/api/items?page={page}&size={size}

• Configure the authentication and set other parameters.

Step 2: Extract Response Metadata

• Add a script to parse the API response and extract pagination-related metadata to the Postprocessor field. Example:

// skip auth requests
if (!isAuthRequest) {
  // parse the API response
  var responseData = JSON.parse(response);

  // get etl config
  var etlConfig = com.toolsverse.config.SystemConfig.instance().
  getEtlThreadContext() ? com.toolsverse.config.SystemConfig.instance().
  getEtlThreadContext().getData() : null;

  // adjust based on API response structure
  var records = responseData.data;

  if (etlConfig && (!records || records.length === 0)) {
     // request to stop the loop for the flow executed in the loop
     etlConfig.setRequestStop(true, 'flow name');
  }
}

// return unmodified response, modify if needed
value = response;

• Optional: Extract Total Records for Iteration Limit:

// add this block before the last line, which must be value = ....
// Adjust based on API's response structure
var metadata = responseData.metadata; 

if (metadata && metadata.total_records && !etlConfig.getValue("total_records")) {
   // Set total records
   etlConfig.setValue("total_records", metadata.total_records); 
}

• Save the HTTP connection.

Step 3: Create the ETL Flow

• Go to Flows > Add flow > Select flow where the Source is Web Service.

• Name this flow.

• Configure source-to-destination transformation where the source is HTTP connection created in step 1.

• When configuring source-to-destination transformation enable Configure > Parameters > Ignore when there are no records. By enabling this flag you will avoid an exception if the response from API contains no data.

• Save the flow.

Step 4: Setup Loop

1. Create a Nested Flow:

• Go to Flows > Add flow > Nested Flow.

• Name this flow.

2. Add the ETL Flow:

• Add the ETL flow created in Step 3 as a step in this nested flow.

3. Define Flow Variables:

• Go to the Parameters tab in the nested flow.

• Under Flow Variables, define key-value pairs and starting values. The keys must be the same as {variables} in the HTTP connection URL. Only set the size variable. We will be setting the page dynamically.

Example:

size	10

4. Configure the Loop:

• Add a Script Loop to dynamically update the pagination parameters. The scrip loop ends when the loop function returns null.

Case 1: Stop When No More Data

Use this loop script when the API stops returning data:

// get the variables for the loop
var vars = com.toolsverse.config.SystemConfig.instance().getProperties();

var page = vars.get("page");
var currentPage;

// if page is not already set 
if (!page) {
   currentPage = 1;
} else { // otherwise increment the page
   currentPage = parseInt(page, 10) + 1;
}

// Set the page variable as string
vars.put("page", currentPage.toString());

// stop the loop if request to stop is set in the Preprocessor
value = etlConfig.isRequestStop() ? null : "continue";

Case 2: Stop After Total Records

Use this loop script when the total number of records is provided:

// get the total records set in Postprocessor
var totalRecords = etlConfig.getValue("total_records");

// stop the loop immediately if totalRecords is not set or 0
if (totalRecords == 0) {
   value = 0;
} else { // otherwise continue

   // get the variables for the loop
   var vars = com.toolsverse.config.SystemConfig.instance().getProperties();

   // the size of the page
   var size = parseInt(vars.get("size"), 10);

   // calculate total number of pages
   var totalPages = Math.ceil(totalRecords / size); 

   var page = vars.get("page");
   var currentPage;

   // if page is not already set 
   if (!page) {
       currentPage = 1;
   } else { // otherwise increment the page
       currentPage = parseInt(page, 10) + 1;
   }

   // Set the page variable
   vars.put("page", currentPage.toString());

   // stop the loop if currentPage > totalPages
   value = currentPage > totalPages ? null : "continue";
}

Cursor-Based Pagination

Description: Uses a cursor (a unique identifier or token) to determine the starting point of the next page.

Common Parameters:

• cursor: A token indicating the starting point for the next page.

• limit: The number of records to retrieve per page.

Example:

GET /api/items?cursor=abc123&limit=10

The API response includes the next_cursor to be used in subsequent requests:

{
   "data": [...],
   "next_cursor": "xyz789"
}

Stop conditions:

• The loop stops when the next_cursor is empty or there is no node next_cursor in the API response.

Cursor-Based Pagination: Implementation

Etlworks provides flexibility for implementing Cursor-Based Pagination by handling APIs that use a cursor or token to determine the starting point of the next page. Below are step-by-step instructions for implementation.

Step 1: Create an HTTP Connection

• Go to Connections > Add Connection > HTTP.

2. Set Up the Connection:

• URL: Enter the API’s base URL (e.g., https://api.example.com).

• Use variables for pagination parameters, such as {cursor} and {limit}.

/api/items?{cursor}&limit={limit}

Note that since there is no initial value for {cursor} it is set as {cursor} in URL instead of cursor={cursor}. It will allow us to set the empty value for the {cursor} variable.

• Configure the authentication and any other required parameters.

Step 2: Extract Response Metadata

• Add a script to parse the API response and extract pagination-related metadata to the Postprocessor field. Example:

// skip auth requests
if (!isAuthRequest) {
  // Parse the API response
  var responseData = JSON.parse(response);

  // get etl config
  var etlConfig = com.toolsverse.config.SystemConfig.instance().
  getEtlThreadContext() ? com.toolsverse.config.SystemConfig.instance().
  getEtlThreadContext().getData() : null;

  // Extract the cursor for the next page
  // Adjust based on API's response structure
  var nextCursor = responseData.next_cursor; 
  if (!nextCursor) {
      // Request to stop the loop for the given flow if no next cursor is available
      if (etlConfig) {
         etlConfig.setRequestStop(true, 'flow name');
      }
  } else {
     // Set the next cursor in the flow variables
     if (etlConfig) { 
        etlConfig.setValue("cursor", nextCursor);
     }
  }
}

// Return unmodified response to the ETL flow, modify if needed
value = response;

• Save the HTTP connection.

Step 3: Create the ETL Flow

• Go to Flows > Add flow > Select flow where the Source is Web Service.

• Name this flow.

• Configure source-to-destination transformation where the source is HTTP connection created in step 1.

• Save the flow.

Step 4: Setup Loop

1. Create a Nested Flow:

• Go to Flows > Add flow > Nested Flow.

• Name this flow.

2. Add the ETL Flow:

• Add the ETL flow created in Step 3 as a step in this nested flow.

3. Define Flow Variables:

• Go to the Parameters tab in the nested flow.

Example:

size	10

4. Configure the Loop:

• Add a Script Loop to dynamically update the pagination parameters. The scrip loop ends when the loop function returns null.

Script Loop Example

Use this loop script when the API stops returning a cursor:

// Get the variables for the loop
var vars = com.toolsverse.config.SystemConfig.instance().getProperties();

// Get the next cursor set by parsing the response in Postprocessor
var cursor = etlConfig.getValue("cursor");

if (cursor) {
   // set the value of the variable {cursor} 
   vars.put("cursor", "cursor=" + cursor); 
} else {
   // set to empty 
   vars.put("cursor", ""); 
}

// Stop the loop if the Postprocessor sets a request stop
value = etlConfig.isRequestStop() ? null : "continue";

Token-Based Pagination

Description: Uses a next_page_token or similar field provided in the API response to fetch the next set of results.

Common Parameters:

• token or page_token: A token for the next page.

Example:

GET /api/items?page_token=abc123

The API response includes the next_page_token:

{
   "data": [...],
   "next_page_token": "xyz789"
}

Stoping conditions:

• The loop stops when the next_page_token is empty or there is no node next_page_token in the API response.

Token-Based Pagination: Implementation

The implementation is exactly the same as for Cursor-Based pagination except there is typically no size parameter.

Pagination Based on Presence of the data

This type of pagination relies on the presence and contents of the data node in the API response. The process stops when either:

1. The data node is empty: [].

2. The data node is absent in the response.

This method is often used when the API does not explicitly return pagination metadata like cursor, page, or token.

Typical API Response:

{
   "optional_metadata": 21423,
   "data": [
      {"key": 1, "value": "value1"},
      {"key": 2, "value": "value2"}
   ].  
}

Stopping Conditions is empty data not no data node:

{
    "optional_metadata": 21423,
    "data": []
}

Pagination Based on Presence of the data Node: Implementation

Etlworks supports pagination based on the presence and contents of the data node in the API response. This approach is commonly used when the API does not provide explicit pagination metadata, such as cursor, page, or token. Below are step-by-step instructions for implementing this pagination technique.

Step 1: Create an HTTP Connection

• Go to Connections > Add Connection > HTTP.

2. Set Up the Connection:

• URL: Enter the API’s base URL (e.g., https://api.example.com).

• Configure the authentication and any other required parameters.

Step 2: Extract and Parse Response

• Add a script to parse the API response to the Postprocessor field. Example:

// skip auth requests
if (!isAuthRequest) {
  // Parse the API response
  var responseData = JSON.parse(response);

  // get etl config
  var etlConfig = com.toolsverse.config.SystemConfig.instance().
  getEtlThreadContext() ? com.toolsverse.config.SystemConfig.instance().
  getEtlThreadContext().getData() : null;

  // Check if the `data` node exists and is not empty
  // Adjust based on API's response structure
  var records = responseData.data; 
  if (etlConfig && (!records || records.length === 0)) {
      // Request to stop the loop if `data` is absent or empty
      etlConfig.setRequestStop(true, 'flow name');
  }
}

// Must be in the last line of the code
// Return unmodified response to the ETL flow, modify if needed
value = response;

• Save the HTTP connection.

Step 3: Create the ETL Flow

• Go to Flows > Add flow > Select flow where the Source is Web Service.

• Name this flow.

• Configure source-to-destination transformation where the source is HTTP connection created in step 1.

• Save the flow.

Step 4: Setup Loop

1. Create a Nested Flow:

• Go to Flows > Add flow > Nested Flow.

• Name this flow.

2. Add the ETL Flow:

• Add the ETL flow created in Step 3 as a step in this nested flow.

3. Configure the Loop:

• Add a Script Loop. The scrip loop ends when the loop function returns null.

Script Loop Example

Use this loop script when the API stops returning a cursor:

// Stop the loop if the Postprocessor sets a request to stop
value = etlConfig.isRequestStop() ? null : "continue";

Pagination Based on Specific Attribute in the API Response

Some APIs include a specific attribute in the response that indicates whether more data is available. This attribute can be a boolean value, a string, or a numeric field that determines when pagination should stop. Instead of relying on an empty dataset or a cursor, the pagination process continues until this attribute meets a specific condition.

Common Examples:

• A boolean flag, such as more: true/false, which directly tells whether more data exists.

• A numeric counter, such as remainingRecords: 100, which counts down to 0.

• A string-based flag, such as nextPage: null, which stops when no value is returned.

Example API Request:

GET /api/items?limit=10

Example API Responses:

Boolean Example

{
   "data": [...],
   "more": false
}

Numeric Example

{
   "data": [...],
   "remainingRecords": 0
}

String Example

{
   "data": [...],
   "nextPage": null
}

Stopping Conditions:

• The loop stops when the specified attribute meets a defined condition, such as:

• more = false (boolean flag)

• remainingRecords = 0 (numeric counter)

• nextPage = null (string value)

Pagination Based on Specific Attribute in the API Response: Implementation

This pagination method follows the same general approach as other pagination techniques but requires checking a specific attribute in the API response.

Step 1: Create an HTTP Connection

1. Go to Connections > Add Connection > HTTP.

2. Configure the base URL, authentication, and request parameters as needed.

Step 2: Extract and Parse Response

Add a script to parse the API response in the Postprocessor field to check for the attribute that controls pagination.

// Skip authentication requests
if (!isAuthRequest) {
   // Parse the API response
   var responseData = JSON.parse(response);

   // Get ETL config
   var etlConfig = com.toolsverse.config.SystemConfig.instance().
     getEtlThreadContext() ? com.toolsverse.config.SystemConfig.instance().
     getEtlThreadContext().getData() : null;

     // Define the stopping condition based on the API response
     var stopConditionMet = false;

     // Example 1: Boolean flag
     if (responseData.more === false) {
        stopConditionMet = true;
     }

     // Example 2: Numeric counter reaching zero
     if (responseData.remainingRecords !== undefined 
          && responseData.remainingRecords === 0) {
        stopConditionMet = true;
     }

     // Example 3: String flag being null
     if (responseData.nextPage === null) {
        stopConditionMet = true;
     }

     // Request to stop the loop if any stopping condition is met
     if (stopConditionMet && etlConfig) {
       etlConfig.setRequestStop(true, 'flow name');
     }
}

// Return unmodified response to the ETL flow, modify if needed
value = response;

Step 3: Create the ETL Flow

1. Go to Flows > Add flow and select a Flow where the Source is Web Service.

2. Configure a source-to-destination transformation using the HTTP connection created in Step 1.

3. Enable Configure > Parameters > Ignore when there are no records to prevent errors when an empty response is returned.

4. Save the Flow.

Step 4: Set Up the Loop

1. Create a Nested Flow to control pagination.

2. Add the ETL Flow created in Step 3 as a step in the nested Flow.

3. Configure a Script Loop to continue iterating until the stopping condition is met.

Script Loop Example

// Stop the loop if the Postprocessor sets a request stop
value = etlConfig.isRequestStop() ? null : "continue";

Time-Based Pagination

Description: Uses a timestamp or other time-based parameter to paginate through data, often for logs or real-time data.

Common Parameters:

• start_time: The starting timestamp.

• end_time: The ending timestamp.

• limit: The number of records per page.

Example:

GET /api/logs?start_time=2025-01-01T00:00:00Z&size=10

Time-Based Pagination: Implementation

Time-based pagination uses a timestamp or time-based parameter to paginate through data, often for logs, real-time data, or systems that append data over time. This method works by fetching data within specific time ranges and iteratively updating the time range to retrieve subsequent records.

Below are step-by-step instructions for implementing time-based pagination in Etlworks.

Step 1: Create an HTTP Connection

• Go to Connections > Add Connection > HTTP.

2. Set Up the Connection:

• URL: Enter the API’s base URL (e.g., https://api.example.com).

• Use variables for pagination parameters, such as {start_time}, {end_time}, and {limit}:

/api/logs?start_time={start_time}&limit={limit}

• Configure the authentication and any other required parameters.

Step 2: Extract and Parse Response

• Add a script to parse the API response to the Postprocessor field. Example:

// skip auth requests
if (!isAuthRequest) {
  // Parse the API response
  var responseData = JSON.parse(response);

  // get etl config
  var etlConfig = com.toolsverse.config.SystemConfig.instance().
  getEtlThreadContext() ? com.toolsverse.config.SystemConfig.instance().
  getEtlThreadContext().getData() : null;

  // Extract the data array and find the last timestamp
  // Adjust based on API's response structure
  var records = responseData.data; 
  if (etlConfig) {
    if (!records || records.length === 0) {
       // Request to stop the loop if no more data is available
       etlConfig.setRequestStop(true, 'flow name');
    } else {
       // Extract the last timestamp from the data
       var lastRecord = records[records.length - 1];
       var lastTimestamp = lastRecord.timestamp; // Adjust based on API's structure
       etlConfig.setValue("start_time", lastTimestamp);
    }
  }
}

// Must be in the last line of the code
// Return unmodified response to the ETL flow, modify if needed
value = response;

• Save the HTTP connection.

Step 3: Create the ETL Flow

• Go to Flows > Add flow > Select flow where the Source is Web Service.

• Name this flow.

• Configure source-to-destination transformation where the source is HTTP connection created in step 1.

• Save the flow.

Step 4: Setup Loop

1. Create a Nested Flow:

• Go to Flows > Add flow > Nested Flow.

• Name this flow.

2. Add the ETL Flow:

• Add the ETL flow created in Step 3 as a step in this nested flow.

3. Define Flow Variables:

• Go to the Parameters tab in the nested flow.

• Under Flow Variables, define key-value pairs and starting values. The keys must be the same as {variables} in the HTTP connection URL.

Example:

size	10
start_time	2025-01-01T00:00:00Z

4. Configure the Loop:

• Add a Script Loop to dynamically update the pagination parameters. The scrip loop ends when the loop function returns null.

Script Loop Example

Use this loop script when the API stops returning data:

// Stop the loop if the Postprocessor sets a request stop
value = etlConfig.isRequestStop() ? null : "continue";

If you want to dynamically calculate the initial start_time use the following script. Example:

// Get the current timestamp
var now = new Date();

// Subtract 3 days
var threeDaysAgo = new Date(now.getTime() - (3 * 24 * 60 * 60 * 1000)); 
var startTime = threeDaysAgo.toISOString(); // Convert to ISO 8601 format

// Get the variables for the loop
var vars = com.toolsverse.config.SystemConfig.instance().getProperties();

vars.put("start_time", startTime); // Set the initial start_time

Stop the pagination loop

While the examples in this article illustrate common approaches to handling paginated APIs using script loops, it’s important to note that pagination logic can vary significantly depending on the use case. For instance:

• You might use a SQL loop that dynamically populates pagination parameters such as page and size.

• You could implement a custom script loop tailored to specific API behaviors or response formats.

• In some cases, a file-based loop might drive the pagination, with each file representing a page or set of parameters.

Regardless of the loop type, Etlworks provides a universal mechanism to stop the pagination loop dynamically. By leveraging the Postprocessor in the HTTP connection, you can evaluate the API response and, if necessary, stop the loop.

To stop the pagination loop programmatically

• Users should use etlConfig.setRequestStop(true, 'flow name'), where 'flow name' is the name of the Flow executing inside the loop that drives pagination.

• etlConfig.setRequestStop(true) without a flow name is still available for backward compatibility, but it will stop all loops, including parallel and nested loops. Once set, etlConfig.setRequestStop(true) must be manually reset, as it does not reset automatically (not required for using named loops).

Full example

For example, if the data node is empty or missing, you can set the loop to stop using the following line of code:

// skip auth requests
if (!isAuthRequest) {
  // Parse the API response
  var responseData = JSON.parse(response);

  // get etl config
  var etlConfig = com.toolsverse.config.SystemConfig.instance().
  getEtlThreadContext() ? com.toolsverse.config.SystemConfig.instance().
  getEtlThreadContext().getData() : null;

  // stop if there is no data
  var records = responseData.data; 
  if (etlConfig) {
     if (!records || records.length === 0) {
        // Request to stop the loop if no more data is available
        etlConfig.setRequestStop(true, 'flow name');
     }
  }
}

// Return unmodified response to the ETL flow, modify if needed
value = response;

This method can be used not only for cases where there is no data but also for other conditions, such as specific error codes or unexpected responses.

Note: When the API response contains no data, you must also enable the Configure > Parameters > Ignore when there are no records option in the source-to-destination transformation. This ensures that the flow does not throw an exception when handling an empty response.

By combining flexible loop configurations with the ability to dynamically stop execution, Etlworks enables you to handle diverse and complex pagination scenarios.

Handling Throttling and Rate Limits

When implementing pagination, making multiple HTTP calls in a loop to fetch paginated data can potentially lead to throttling or exceed the rate limits imposed by API providers. Many APIs enforce strict rules on the number of requests allowed within a specific time frame, and exceeding these limits can result in errors, delays, or even temporary bans.

To address this, Etlworks provides configurable parameters in the HTTP connection that help manage the frequency of requests and prevent throttling:

• Wait after N calls (ms): This parameter defines how long the system should pause after making a specified number of calls to the same endpoint. For example, if set to 5000 (5 seconds), the system will pause for 5 seconds after reaching the limit set by “Number of calls.”

• Number of calls: This parameter works in tandem with “Wait after N calls (ms)” and specifies the maximum number of requests that can be made to an endpoint before the system pauses. For instance, if set to 100, the system will make 100 calls, then wait for the duration specified in “Wait after N calls (ms).”

• Endpoint: By default, the endpoint is derived from the URI authority (e.g., https://myapi.com/data resolves to myapi.com). You can override this if you need to group requests differently or manage throttling for specific subdomains or paths.

Articles in this section

Advanced Pagination with Scripts

Overview

Steps for Working with Paginated APIs using Scripts

Pagination Methods

Offset-Based Pagination

Offset-Based Pagination: Implementation

Cursor-Based Pagination

Cursor-Based Pagination: Implementation

Token-Based Pagination

Token-Based Pagination: Implementation

Pagination Based on Presence of the data

Pagination Based on Presence of the data Node: Implementation

Pagination Based on Specific Attribute in the API Response

Pagination Based on Specific Attribute in the API Response: Implementation

Time-Based Pagination

Time-Based Pagination: Implementation

Stop the pagination loop

To stop the pagination loop programmatically

Full example

Handling Throttling and Rate Limits

Articles in this section

Overview

Steps for Working with Paginated APIs using Scripts

Pagination Methods

Offset-Based Pagination

Offset-Based Pagination: Implementation

Cursor-Based Pagination

Cursor-Based Pagination: Implementation

Token-Based Pagination

Token-Based Pagination: Implementation

Pagination Based on Presence of the data

Pagination Based on Presence of the data Node: Implementation

Pagination Based on Specific Attribute in the API Response

Pagination Based on Specific Attribute in the API Response: Implementation

Time-Based Pagination

Time-Based Pagination: Implementation

Stop the pagination loop

To stop the pagination loop programmatically

Full example

Handling Throttling and Rate Limits

Related articles