Scheduled Batch Processor
Process a queue of items on a recurring schedule — daily reconciliation, weekly reports, hourly sync jobs. The workhorse pattern for routine data processing.
On this page
Visual Flow
Rendering diagram…
When to Use This Pattern
Use a scheduled batch processor when:
- You need to process items on a recurring schedule (hourly, daily, weekly)
- Items accumulate in a queue and should be processed in bulk (not one-by-one as they arrive)
- The processing involves external systems that prefer batch API calls over individual requests
- You run regular jobs like reconciliation, report generation, or data cleanup
How It Works
| Phase | Description |
|---|---|
| Trigger | Schedule fires (e.g., daily at 6 AM) |
| Fetch | Query for unprocessed items (status = "pending") |
| Process | Loop through items, applying business logic to each |
| Update | Mark each item as processed/failed |
| Report | Generate a summary of what was processed |
Implementation Guide
Step 1: Define the Schedule and Scope
| Parameter | Example | Notes |
|---|---|---|
| Frequency | Daily at 6:00 AM UTC | Use UTC to avoid DST issues |
| Batch size | 100 items per run | Prevents timeout on large queues |
| Scope query | WHERE status = 'pending' AND created_on < NOW() - INTERVAL 1 HOUR | Only process items older than 1 hour (avoids race conditions) |
Step 2: Fetch the Batch
Query your data source for items to process:
- Filter by status (pending, ready, queued)
- Order by priority, then by age (oldest first)
- Limit to batch size
- Lock the items — update their status to "processing" immediately to prevent another workflow instance from picking them up
Step 3: Process Each Item
For each item in the batch:
- Validate the data
- Apply business logic (calculations, lookups, transformations)
- Call external APIs if needed
- On success: update status to "completed"
- On failure: update status to "failed", log the error
Step 4: Handle Stragglers
After the loop completes:
- Failed items: Log them for investigation. After N failures, flag for manual review.
- Stuck items: If an item has been in "processing" for too long (>1 hour), it probably failed silently. Reset to "pending" for the next batch.
Step 5: Generate a Summary
At the end of each run, produce a report:
| Metric | Value |
|---|---|
| Total processed | 95 |
| Successful | 90 |
| Failed | 3 |
| Skipped (already done) | 2 |
| Duration | 4m 32s |
| Next scheduled run | Tomorrow 6:00 AM |
Send this summary to the operations team or write it to a log.
Tips & Best Practices
Never let two instances of the same batch job run simultaneously. Use a lock mechanism: check for a running instance before starting, and always release the lock when done (even on failure).
- Idempotency is critical. If the workflow crashes mid-batch and restarts, processing the same item twice should not create duplicate data.
- Stagger large batches. If you have 10,000 items, process them in chunks of 100 with a short pause between chunks to avoid overloading systems.
- Monitor no-ops. If the batch runs and processes 0 items repeatedly, it might mean your queue is empty (good) or your filter query is wrong (bad). Alert if appropriate.
- Log timing data. Track how long each batch run takes. If it's creeping upward, you'll spot capacity issues before they become outages.
Related patterns
Saga with Compensating Transactions
Coordinate a multi-step business transaction that spans several systems by pairing each step with a rollback action. If a later step fails, run the rollbacks in reverse to restore a consistent state.
State Machine Workflow
Model a business process as a set of defined states with explicit transitions between them. Unlike linear workflows, items can move forwards, backwards, and loop — matching how real business processes actually behave.
Fan-Out / Fan-In
Split a workload into parallel branches, process them simultaneously, then aggregate the results. Dramatically reduces processing time for batch operations.