Patterns
intermediatedata processing

Data Enrichment Pipeline

Take a basic record and enhance it with data from external APIs — geocoding addresses, resolving company info, validating phone numbers, scoring leads. Transform minimal input into rich, actionable data.

Views30
BPMN 2.0
On this page

Visual Flow

Rendering diagram…

When to Use This Pattern

Use data enrichment when:

  • Your records have minimal information (just a name and email) and you need more
  • You want to reduce manual data entry by auto-populating fields from external sources
  • Data quality is important for downstream processes (analytics, marketing, compliance)
  • You're enriching leads, contacts, companies, or addresses

How It Works

Rendering diagram…

Implementation Guide

Step 1: Define the Enrichment Sources

Map which fields come from which APIs:

Field NeededSource APIInput RequiredCost
Company infoClearbit, ZoomInfoEmail domain$$
Address geocodingGoogle Maps, Azure MapsStreet address$
Address validationgetAddress.io, SmartyStreetsPostal code$
Phone validationTwilio, MailboxlayerPhone number$
Email validationZeroBounce, MailboxlayerEmail address$
Country infoREST CountriesCountry codeFree
Currency ratesFixer.io, XE.comCurrency codeFree/$
WeatherOpenWeather, AccuWeatherLocationFree/$
Sanctions checkOFAC APIName + country$$
TranslationGoogle TranslateText string$
Step 2: Design the Pipeline

Process enrichment sources in order of value and dependency:

Stage 1 (No dependencies):
  - Validate email format
  - Validate phone format
  - Validate postal code

Stage 2 (After validation):
  - Lookup company by email domain
  - Geocode address
  - Lookup location by postal code

Stage 3 (After company lookup):
  - Enrich company details (industry, size, revenue)
  - Find social profiles
  - Calculate risk score
Step 3: Handle API Failures Gracefully

Not every API call will succeed:

ScenarioHandling
API returns no dataKeep the field empty, mark as "not enriched"
API returns 429 (rate limit)Queue for retry using Retry with Exponential Backoff
API returns errorLog the error, skip this enrichment source, continue with others
API returns low-confidence matchStore the data but flag it for review

Rule of thumb: never let an enrichment failure block the record. The record should pass through even if some enrichments fail.

Step 4: Merge and Deduplicate

When enrichment data conflicts with existing data:

StrategyWhen
Enrichment winsThe field was empty; filling it in is always good
Existing winsThe field already has user-provided data; don't overwrite
Highest confidence winsCompare confidence scores from the API
Flag for reviewWhen both have data but they disagree
Step 5: Monitor and Optimise

Track enrichment effectiveness:

MetricTarget
Enrichment rate>80% of records have all key fields filled
API success rate>95% per source
Enrichment cost per recordUnder your budget threshold
Data freshnessRe-enrich every 6 months

Tips & Best Practices

Warning

Be mindful of data privacy. Enrichment APIs process personal data. Ensure your use complies with GDPR, CCPA, and your privacy policy. Some APIs cache and share data you send them.

  • Cache API responses. If you're enriching 1,000 records from the same company, call the company lookup once and reuse the result. This saves money and speeds up processing.
  • Enrich on trigger, not in batch. For new records (form submissions, registrations), enrich immediately. For existing records, batch-enrich during off-hours.
  • Start with free/cheap APIs. REST Countries, phone format validation, and email format checks are free. Use expensive APIs (Clearbit, ZoomInfo) only when the record passes initial quality checks.
  • Audit your enrichment. Keep a log of which fields were enriched, from which source, and when. When data is questioned, you can trace its origin.

Related patterns