Email Parsing Pipeline
Monitor an inbox, extract structured data from incoming emails and attachments, validate it, and feed it into downstream workflows. Turn unstructured email into actionable data.
On this page
Visual Flow
Rendering diagram…
When to Use This Pattern
Use email parsing when:
- Business processes are triggered by incoming emails (invoices, orders, support requests)
- The emails follow a predictable format (system-generated, templated, or form-based)
- You want to eliminate manual copy-paste from emails into business systems
- External parties can't or won't use your web forms or portal
How It Works
| Stage | Action | Output |
|---|---|---|
| 1. Monitor | Watch a shared mailbox or specific inbox | New email detected |
| 2. Classify | Determine the email type (invoice, request, notification) | Email category |
| 3. Extract | Pull structured data from subject, body, and attachments | Key-value pairs |
| 4. Validate | Check extracted data for completeness and correctness | Validated data |
| 5. Route | Feed the data into the appropriate workflow | Business process started |
| 6. Archive | Move the processed email to a "Processed" folder | Clean inbox |
Implementation Guide
Step 1: Set Up the Monitoring
Shared mailbox approach:
- Create a dedicated mailbox:
invoices@company.comorrequests@company.com - Configure the workflow to check for new emails every 5–15 minutes
- Process only unread emails in the Inbox folder
Folder-based approach:
- External systems drop emails/files into a monitored SharePoint folder or SFTP
- The workflow triggers on new items in the folder
Step 2: Classify the Email
Before parsing, determine what type of email you're dealing with:
| Signal | Classification Rule |
|---|---|
| Subject contains "INV-" | Invoice |
From address is noreply@vendor.com | Automated vendor notification |
| Has PDF attachment | Likely an invoice or report |
| Subject contains "RE:" or "FW:" | Reply/forward — may need different handling |
| Body contains "unsubscribe" | Marketing — skip |
Step 3: Extract Data
From the email itself:
- Subject line parsing — regex for order numbers, reference IDs, amounts
- Body parsing — look for labeled fields ("Order Number: 12345") or HTML table structures
- Sender info — email address, display name, domain
From attachments:
- PDF invoices — use OCR (Nintex AI, Azure Form Recognizer) to extract fields
- Excel files — parse rows and columns programmatically
- CSV files — straightforward column mapping
- Images — OCR for receipts, business cards
Step 4: Validate and Enrich
| Validation | Action if Failed |
|---|---|
| Required fields present | Flag for manual review |
| Amount is a valid number | Attempt cleanup, flag if ambiguous |
| Vendor exists in system | Create new vendor or flag |
| No duplicates (same invoice#) | Skip and log |
| Date is reasonable | Flag future dates or >90 days old |
Step 5: Route to Downstream Workflow
Based on the classification and extracted data:
- Invoice → Invoice Processing workflow (match to PO, route for approval)
- Support request → IT Help Desk (create ticket, assign to team)
- Customer enquiry → CRM (create lead or case)
- Report → Archive to document library with metadata
Step 6: Handle Failures
| Failure Type | Action |
|---|---|
| Can't classify | Move to "Manual Review" folder, notify team |
| Extraction confidence low | Route to Human-in-the-Loop Review |
| Validation errors | Reply to sender with specific issues (if appropriate) |
| Duplicate detected | Log and archive — don't process twice |
Tips & Best Practices
Never process emails from the Sent or Deleted folders. Only process from Inbox, and move processed emails to an "Archive" or "Processed" subfolder immediately to prevent re-processing.
- Use AI for unstructured emails. For free-text emails that don't follow a template, use Nintex AI or OpenAI to classify and extract intent, entities, and urgency.
- Set up a "poison pill" handler. Some emails will crash your parser (huge attachments, malformed HTML, password-protected PDFs). Catch these errors and quarantine the email.
- Log everything. Keep a processing log: email received at, classified as, fields extracted, routed to, processing time. This is essential for debugging and compliance.
- Reply with confirmation. For externally-submitted emails, send an auto-reply confirming receipt with a reference number. Senders need to know their email was received.
Related patterns
API Polling with Change Detection
Periodically check an external system for changes and trigger workflows when new or updated records are detected. The reliable alternative when webhooks aren't available.
Change Data Capture Stream
Stream row-level changes out of a database in near real-time using the transaction log. No polling, no app changes — downstream systems get inserts, updates, and deletes as they happen.
Reverse ETL
Push modelled data from your warehouse back into the SaaS tools that business teams use every day — CRM, marketing, support — so they can act on analytics without a BI detour.