advancederror handling Featured

Retry with Exponential Backoff

Gracefully handle transient API failures by retrying with increasing delays. Essential for any workflow that calls external services.

BPMN 2.0

On this page

Visual Flow

Rendering diagram…

When to Use This Pattern

Use retry with backoff when:

Your workflow calls external APIs that occasionally fail (timeouts, rate limits, 500 errors)
The failure is likely transient — retrying after a short delay will succeed
You want your workflow to be self-healing rather than requiring manual intervention
You're integrating with services that have rate limits (e.g., 100 requests/minute)

Note

This pattern only helps with transient failures. If the API requires different credentials or the endpoint URL is wrong, retrying won't help. Always check the error type before retrying.

How It Works

On failure, wait an increasing amount of time before each retry. This prevents hammering a struggling service and respects rate limits.

Attempt	Wait Time	Cumulative Time	Action
1	0s	0s	Initial request
2	2s	2s	First retry
3	4s	6s	Second retry
4	8s	14s	Third retry
5	16s	30s	Fourth retry
—	—	30s	Give up → error handling

The formula: wait = base_delay × 2^(attempt - 1)

Add jitter (random ±20%) to prevent multiple workflows from retrying at the exact same moment (thundering herd problem).

Implementation Guide

Step 1: Wrap the API Call in a Loop

max_retries = 4
base_delay = 2 seconds
attempt = 0

LOOP while attempt < max_retries:
    result = call_api()
    
    IF result.success:
        BREAK → continue workflow
    
    IF result.error is NOT retryable:
        BREAK → go to error handler
    
    wait_time = base_delay × 2^attempt + random(-20%, +20%)
    PAUSE for wait_time
    attempt = attempt + 1

IF attempt >= max_retries:
    → Circuit breaker / error notification

Step 2: Classify Errors

Not all errors are worth retrying:

Error Type	Retryable?	Example
Timeout (408, ETIMEDOUT)	Yes	Server was busy
Rate limited (429)	Yes	Too many requests — respect Retry-After header
Server error (500, 502, 503)	Yes	Transient server issue
Bad request (400)	No	Your data is wrong — fix it
Unauthorized (401, 403)	No	Credentials are wrong
Not found (404)	No	Endpoint or resource doesn't exist

Step 3: Implement in Nintex

Workflow Cloud approach:

Use a Loop action with a counter variable
Inside the loop: call the API with Call web service
Check the response status code
If retryable error: increment counter, use Pause for duration
If success: set a flag variable and exit loop

RPA approach:

Wrap the action in a Try-Catch
In the Catch block: increment retry counter, wait, loop back
Use a loop condition: retryCount < maxRetries AND NOT success

Step 4: Add a Circuit Breaker

After max retries are exhausted:

Log the failure with full error details
Notify an admin via email or Teams
Save the payload for manual retry later
Mark the workflow item as failed (don't silently drop it)

Tips & Best Practices

Warning

Never retry without a delay. Immediate retries create a tight loop that can overwhelm the target service and get your IP blocked.

Respect Retry-After headers. If the API sends a 429 with Retry-After: 60, wait 60 seconds regardless of your backoff schedule.
Set a total timeout. Even with backoff, cap the total wait at a reasonable limit (e.g., 5 minutes). Don't let a workflow hang for hours.
Log every retry attempt. When debugging, you need to know how many retries happened and what errors were returned.
Use idempotency keys. If the API supports them, send an idempotency key so that accidental duplicate requests don't create duplicate records.
Monitor retry rates. If retries are happening frequently, the problem might be systemic (undersized API, quota too low) rather than transient.

Related patterns

error handlingFeatured

Circuit Breaker

Stop calling a failing dependency before it drags you down with it. After a threshold of errors, open the circuit, fail fast for a cooldown period, then cautiously let traffic back in.

Intermediate

error handling

Dead Letter Queue

Capture workflow items that fail processing after all retry attempts are exhausted. Store them safely for investigation, manual correction, and replay — never silently lose data.

Advanced