Patterns
advancedintegration

Email Parsing Pipeline

Monitor an inbox, extract structured data from incoming emails and attachments, validate it, and feed it into downstream workflows. Turn unstructured email into actionable data.

Views10
BPMN 2.0
On this page

Visual Flow

Rendering diagram…

When to Use This Pattern

Use email parsing when:

  • Business processes are triggered by incoming emails (invoices, orders, support requests)
  • The emails follow a predictable format (system-generated, templated, or form-based)
  • You want to eliminate manual copy-paste from emails into business systems
  • External parties can't or won't use your web forms or portal

How It Works

StageActionOutput
1. MonitorWatch a shared mailbox or specific inboxNew email detected
2. ClassifyDetermine the email type (invoice, request, notification)Email category
3. ExtractPull structured data from subject, body, and attachmentsKey-value pairs
4. ValidateCheck extracted data for completeness and correctnessValidated data
5. RouteFeed the data into the appropriate workflowBusiness process started
6. ArchiveMove the processed email to a "Processed" folderClean inbox

Implementation Guide

Step 1: Set Up the Monitoring

Shared mailbox approach:

  • Create a dedicated mailbox: invoices@company.com or requests@company.com
  • Configure the workflow to check for new emails every 5–15 minutes
  • Process only unread emails in the Inbox folder

Folder-based approach:

  • External systems drop emails/files into a monitored SharePoint folder or SFTP
  • The workflow triggers on new items in the folder
Step 2: Classify the Email

Before parsing, determine what type of email you're dealing with:

SignalClassification Rule
Subject contains "INV-"Invoice
From address is noreply@vendor.comAutomated vendor notification
Has PDF attachmentLikely an invoice or report
Subject contains "RE:" or "FW:"Reply/forward — may need different handling
Body contains "unsubscribe"Marketing — skip
Step 3: Extract Data

From the email itself:

  • Subject line parsing — regex for order numbers, reference IDs, amounts
  • Body parsing — look for labeled fields ("Order Number: 12345") or HTML table structures
  • Sender info — email address, display name, domain

From attachments:

  • PDF invoices — use OCR (Nintex AI, Azure Form Recognizer) to extract fields
  • Excel files — parse rows and columns programmatically
  • CSV files — straightforward column mapping
  • Images — OCR for receipts, business cards
Step 4: Validate and Enrich
ValidationAction if Failed
Required fields presentFlag for manual review
Amount is a valid numberAttempt cleanup, flag if ambiguous
Vendor exists in systemCreate new vendor or flag
No duplicates (same invoice#)Skip and log
Date is reasonableFlag future dates or >90 days old
Step 5: Route to Downstream Workflow

Based on the classification and extracted data:

  • Invoice → Invoice Processing workflow (match to PO, route for approval)
  • Support request → IT Help Desk (create ticket, assign to team)
  • Customer enquiry → CRM (create lead or case)
  • Report → Archive to document library with metadata
Step 6: Handle Failures
Failure TypeAction
Can't classifyMove to "Manual Review" folder, notify team
Extraction confidence lowRoute to Human-in-the-Loop Review
Validation errorsReply to sender with specific issues (if appropriate)
Duplicate detectedLog and archive — don't process twice

Tips & Best Practices

Important

Never process emails from the Sent or Deleted folders. Only process from Inbox, and move processed emails to an "Archive" or "Processed" subfolder immediately to prevent re-processing.

  • Use AI for unstructured emails. For free-text emails that don't follow a template, use Nintex AI or OpenAI to classify and extract intent, entities, and urgency.
  • Set up a "poison pill" handler. Some emails will crash your parser (huge attachments, malformed HTML, password-protected PDFs). Catch these errors and quarantine the email.
  • Log everything. Keep a processing log: email received at, classified as, fields extracted, routed to, processing time. This is essential for debugging and compliance.
  • Reply with confirmation. For externally-submitted emails, send an auto-reply confirming receipt with a reference number. Senders need to know their email was received.

Related patterns