Retry with Exponential Backoff
Gracefully handle transient API failures by retrying with increasing delays. Essential for any workflow that calls external services.
On this page
Visual Flow
Rendering diagram…
When to Use This Pattern
Use retry with backoff when:
- Your workflow calls external APIs that occasionally fail (timeouts, rate limits, 500 errors)
- The failure is likely transient — retrying after a short delay will succeed
- You want your workflow to be self-healing rather than requiring manual intervention
- You're integrating with services that have rate limits (e.g., 100 requests/minute)
This pattern only helps with transient failures. If the API requires different credentials or the endpoint URL is wrong, retrying won't help. Always check the error type before retrying.
How It Works
On failure, wait an increasing amount of time before each retry. This prevents hammering a struggling service and respects rate limits.
| Attempt | Wait Time | Cumulative Time | Action |
|---|---|---|---|
| 1 | 0s | 0s | Initial request |
| 2 | 2s | 2s | First retry |
| 3 | 4s | 6s | Second retry |
| 4 | 8s | 14s | Third retry |
| 5 | 16s | 30s | Fourth retry |
| — | — | 30s | Give up → error handling |
The formula: wait = base_delay × 2^(attempt - 1)
Add jitter (random ±20%) to prevent multiple workflows from retrying at the exact same moment (thundering herd problem).
Implementation Guide
Step 1: Wrap the API Call in a Loop
max_retries = 4
base_delay = 2 seconds
attempt = 0
LOOP while attempt < max_retries:
result = call_api()
IF result.success:
BREAK → continue workflow
IF result.error is NOT retryable:
BREAK → go to error handler
wait_time = base_delay × 2^attempt + random(-20%, +20%)
PAUSE for wait_time
attempt = attempt + 1
IF attempt >= max_retries:
→ Circuit breaker / error notification
Step 2: Classify Errors
Not all errors are worth retrying:
| Error Type | Retryable? | Example |
|---|---|---|
| Timeout (408, ETIMEDOUT) | Yes | Server was busy |
| Rate limited (429) | Yes | Too many requests — respect Retry-After header |
| Server error (500, 502, 503) | Yes | Transient server issue |
| Bad request (400) | No | Your data is wrong — fix it |
| Unauthorized (401, 403) | No | Credentials are wrong |
| Not found (404) | No | Endpoint or resource doesn't exist |
Step 3: Implement in Nintex
Workflow Cloud approach:
- Use a Loop action with a counter variable
- Inside the loop: call the API with Call web service
- Check the response status code
- If retryable error: increment counter, use Pause for duration
- If success: set a flag variable and exit loop
RPA approach:
- Wrap the action in a Try-Catch
- In the Catch block: increment retry counter, wait, loop back
- Use a loop condition:
retryCount < maxRetries AND NOT success
Step 4: Add a Circuit Breaker
After max retries are exhausted:
- Log the failure with full error details
- Notify an admin via email or Teams
- Save the payload for manual retry later
- Mark the workflow item as failed (don't silently drop it)
Tips & Best Practices
Never retry without a delay. Immediate retries create a tight loop that can overwhelm the target service and get your IP blocked.
- Respect Retry-After headers. If the API sends a 429 with
Retry-After: 60, wait 60 seconds regardless of your backoff schedule. - Set a total timeout. Even with backoff, cap the total wait at a reasonable limit (e.g., 5 minutes). Don't let a workflow hang for hours.
- Log every retry attempt. When debugging, you need to know how many retries happened and what errors were returned.
- Use idempotency keys. If the API supports them, send an idempotency key so that accidental duplicate requests don't create duplicate records.
- Monitor retry rates. If retries are happening frequently, the problem might be systemic (undersized API, quota too low) rather than transient.
Related patterns
Circuit Breaker
Stop calling a failing dependency before it drags you down with it. After a threshold of errors, open the circuit, fail fast for a cooldown period, then cautiously let traffic back in.
Dead Letter Queue
Capture workflow items that fail processing after all retry attempts are exhausted. Store them safely for investigation, manual correction, and replay — never silently lose data.