Patterns
advancederror handling Featured

Retry with Exponential Backoff

Gracefully handle transient API failures by retrying with increasing delays. Essential for any workflow that calls external services.

Views11
BPMN 2.0
On this page

Visual Flow

Rendering diagram…

When to Use This Pattern

Use retry with backoff when:

  • Your workflow calls external APIs that occasionally fail (timeouts, rate limits, 500 errors)
  • The failure is likely transient — retrying after a short delay will succeed
  • You want your workflow to be self-healing rather than requiring manual intervention
  • You're integrating with services that have rate limits (e.g., 100 requests/minute)
Note

This pattern only helps with transient failures. If the API requires different credentials or the endpoint URL is wrong, retrying won't help. Always check the error type before retrying.

How It Works

On failure, wait an increasing amount of time before each retry. This prevents hammering a struggling service and respects rate limits.

AttemptWait TimeCumulative TimeAction
10s0sInitial request
22s2sFirst retry
34s6sSecond retry
48s14sThird retry
516s30sFourth retry
30sGive up → error handling

The formula: wait = base_delay × 2^(attempt - 1)

Add jitter (random ±20%) to prevent multiple workflows from retrying at the exact same moment (thundering herd problem).

Implementation Guide

Step 1: Wrap the API Call in a Loop
max_retries = 4
base_delay = 2 seconds
attempt = 0

LOOP while attempt < max_retries:
    result = call_api()
    
    IF result.success:
        BREAK → continue workflow
    
    IF result.error is NOT retryable:
        BREAK → go to error handler
    
    wait_time = base_delay × 2^attempt + random(-20%, +20%)
    PAUSE for wait_time
    attempt = attempt + 1

IF attempt >= max_retries:
    → Circuit breaker / error notification
Step 2: Classify Errors

Not all errors are worth retrying:

Error TypeRetryable?Example
Timeout (408, ETIMEDOUT)YesServer was busy
Rate limited (429)YesToo many requests — respect Retry-After header
Server error (500, 502, 503)YesTransient server issue
Bad request (400)NoYour data is wrong — fix it
Unauthorized (401, 403)NoCredentials are wrong
Not found (404)NoEndpoint or resource doesn't exist
Step 3: Implement in Nintex

Workflow Cloud approach:

  1. Use a Loop action with a counter variable
  2. Inside the loop: call the API with Call web service
  3. Check the response status code
  4. If retryable error: increment counter, use Pause for duration
  5. If success: set a flag variable and exit loop

RPA approach:

  1. Wrap the action in a Try-Catch
  2. In the Catch block: increment retry counter, wait, loop back
  3. Use a loop condition: retryCount < maxRetries AND NOT success
Step 4: Add a Circuit Breaker

After max retries are exhausted:

  1. Log the failure with full error details
  2. Notify an admin via email or Teams
  3. Save the payload for manual retry later
  4. Mark the workflow item as failed (don't silently drop it)

Tips & Best Practices

Warning

Never retry without a delay. Immediate retries create a tight loop that can overwhelm the target service and get your IP blocked.

  • Respect Retry-After headers. If the API sends a 429 with Retry-After: 60, wait 60 seconds regardless of your backoff schedule.
  • Set a total timeout. Even with backoff, cap the total wait at a reasonable limit (e.g., 5 minutes). Don't let a workflow hang for hours.
  • Log every retry attempt. When debugging, you need to know how many retries happened and what errors were returned.
  • Use idempotency keys. If the API supports them, send an idempotency key so that accidental duplicate requests don't create duplicate records.
  • Monitor retry rates. If retries are happening frequently, the problem might be systemic (undersized API, quota too low) rather than transient.

Related patterns