Patterns
advancedintegration

Reverse ETL

Push modelled data from your warehouse back into the SaaS tools that business teams use every day — CRM, marketing, support — so they can act on analytics without a BI detour.

Views6
BPMN 2.0
On this page

Visual Flow

Rendering diagram…

When to Use This Pattern

Use reverse ETL when the business team needs analytics-derived data inside the tools they already use:

  • Sales team wants lead scores in Salesforce, not a BI dashboard
  • Marketing wants churn_risk as a property in HubSpot
  • Support wants customer_tier visible on every ticket
  • Ads platforms need a curated audience list refreshed nightly

If your answer to "where should this number live?" is "in a report", consider reverse ETL instead. People don't live in reports.

How It Works

Your warehouse already has the source of truth. A reverse ETL job reads a modelled table or query result and writes it to a SaaS destination via that tool's API. Tools like Hightouch, Census, and Polytomic productise this; you can build it yourself with workflow code, though the edge cases bite.

Two common modes:

  • Full sync — replace the entire target every run. Simple, safe, but slow and API-expensive.
  • Delta sync — detect rows that changed since the last run and only push those. Faster but needs a reliable change signal.
Tip

Most SaaS destinations rate-limit aggressively and charge per API call. A reverse ETL job that pushes 10 million "updates" where only 400 actually changed is a great way to blow your quota.

Implementation Guide

Step 1: Model the data first

Do not sync raw tables. Build a customer_traits model in dbt (or your warehouse) that is the single version of truth for everything you'll push out. Sync that model, not the underlying joins.

Step 2: Pick the destination's primary key

Every sync needs a stable identifier on the destination side (salesforce.contact_id, hubspot.email). Map warehouse rows to that key. Mismatches cause duplicates on the destination that are hard to clean up.

Step 3: Compute deltas cheaply

Option A: hash each row and compare to the last-synced hash. Option B: use a last_modified_at column from the source. Avoid diffing full tables — it scales badly.

Step 4: Handle rate limits and backoff

Every destination has different limits. Implement exponential backoff, chunk the payload, and parallelise within the destination's allowed concurrency. Budget for "when Salesforce is having a day".

Step 5: Log what went where

Every sync should produce a per-row outcome: created, updated, skipped, failed. When a go-to-market person asks "why isn't this lead showing up", the log answers the question without a developer.

Tips & Best Practices

  • Don't push sensitive fields unless strictly required. PII in a dozen SaaS tools is an audit and breach multiplier.
  • Respect destination-side edits. Sales reps will edit fields. Decide: warehouse wins, destination wins, or last-write wins — per field.
  • Version syncs via dbt tags or config. Breaking changes to a model should trigger a deliberate cutover.
  • Track cost per destination. Reverse ETL is sneakily expensive; dashboards help.
  • Keep a kill switch. A bad sync pushing wrong data to CRM needs a one-click stop.

Related patterns