Reverse ETL
Push modelled data from your warehouse back into the SaaS tools that business teams use every day — CRM, marketing, support — so they can act on analytics without a BI detour.
On this page
Visual Flow
Rendering diagram…
When to Use This Pattern
Use reverse ETL when the business team needs analytics-derived data inside the tools they already use:
- Sales team wants lead scores in Salesforce, not a BI dashboard
- Marketing wants
churn_riskas a property in HubSpot - Support wants
customer_tiervisible on every ticket - Ads platforms need a curated audience list refreshed nightly
If your answer to "where should this number live?" is "in a report", consider reverse ETL instead. People don't live in reports.
How It Works
Your warehouse already has the source of truth. A reverse ETL job reads a modelled table or query result and writes it to a SaaS destination via that tool's API. Tools like Hightouch, Census, and Polytomic productise this; you can build it yourself with workflow code, though the edge cases bite.
Two common modes:
- Full sync — replace the entire target every run. Simple, safe, but slow and API-expensive.
- Delta sync — detect rows that changed since the last run and only push those. Faster but needs a reliable change signal.
Most SaaS destinations rate-limit aggressively and charge per API call. A reverse ETL job that pushes 10 million "updates" where only 400 actually changed is a great way to blow your quota.
Implementation Guide
Step 1: Model the data first
Do not sync raw tables. Build a customer_traits model in dbt (or your warehouse) that is the single version of truth for everything you'll push out. Sync that model, not the underlying joins.
Step 2: Pick the destination's primary key
Every sync needs a stable identifier on the destination side (salesforce.contact_id, hubspot.email). Map warehouse rows to that key. Mismatches cause duplicates on the destination that are hard to clean up.
Step 3: Compute deltas cheaply
Option A: hash each row and compare to the last-synced hash. Option B: use a last_modified_at column from the source. Avoid diffing full tables — it scales badly.
Step 4: Handle rate limits and backoff
Every destination has different limits. Implement exponential backoff, chunk the payload, and parallelise within the destination's allowed concurrency. Budget for "when Salesforce is having a day".
Step 5: Log what went where
Every sync should produce a per-row outcome: created, updated, skipped, failed. When a go-to-market person asks "why isn't this lead showing up", the log answers the question without a developer.
Tips & Best Practices
- Don't push sensitive fields unless strictly required. PII in a dozen SaaS tools is an audit and breach multiplier.
- Respect destination-side edits. Sales reps will edit fields. Decide: warehouse wins, destination wins, or last-write wins — per field.
- Version syncs via dbt tags or config. Breaking changes to a model should trigger a deliberate cutover.
- Track cost per destination. Reverse ETL is sneakily expensive; dashboards help.
- Keep a kill switch. A bad sync pushing wrong data to CRM needs a one-click stop.
Related patterns
API Polling with Change Detection
Periodically check an external system for changes and trigger workflows when new or updated records are detected. The reliable alternative when webhooks aren't available.
Change Data Capture Stream
Stream row-level changes out of a database in near real-time using the transaction log. No polling, no app changes — downstream systems get inserts, updates, and deletes as they happen.
Data Sync Bridge
Keep records synchronized between two or more systems. Handles create/update/delete propagation with conflict resolution. Essential for multi-system environments.