advancedintegration

Reverse ETL

Push modelled data from your warehouse back into the SaaS tools that business teams use every day — CRM, marketing, support — so they can act on analytics without a BI detour.

BPMN 2.0

On this page

Visual Flow

Rendering diagram…

When to Use This Pattern

Use reverse ETL when the business team needs analytics-derived data inside the tools they already use:

Sales team wants lead scores in Salesforce, not a BI dashboard
Marketing wants churn_risk as a property in HubSpot
Support wants customer_tier visible on every ticket
Ads platforms need a curated audience list refreshed nightly

If your answer to "where should this number live?" is "in a report", consider reverse ETL instead. People don't live in reports.

How It Works

Your warehouse already has the source of truth. A reverse ETL job reads a modelled table or query result and writes it to a SaaS destination via that tool's API. Tools like Hightouch, Census, and Polytomic productise this; you can build it yourself with workflow code, though the edge cases bite.

Two common modes:

Full sync — replace the entire target every run. Simple, safe, but slow and API-expensive.
Delta sync — detect rows that changed since the last run and only push those. Faster but needs a reliable change signal.

Tip

Most SaaS destinations rate-limit aggressively and charge per API call. A reverse ETL job that pushes 10 million "updates" where only 400 actually changed is a great way to blow your quota.

Implementation Guide

Step 1: Model the data first

Do not sync raw tables. Build a customer_traits model in dbt (or your warehouse) that is the single version of truth for everything you'll push out. Sync that model, not the underlying joins.

Step 2: Pick the destination's primary key

Every sync needs a stable identifier on the destination side (salesforce.contact_id, hubspot.email). Map warehouse rows to that key. Mismatches cause duplicates on the destination that are hard to clean up.

Step 3: Compute deltas cheaply

Option A: hash each row and compare to the last-synced hash. Option B: use a last_modified_at column from the source. Avoid diffing full tables — it scales badly.

Step 4: Handle rate limits and backoff

Every destination has different limits. Implement exponential backoff, chunk the payload, and parallelise within the destination's allowed concurrency. Budget for "when Salesforce is having a day".

Step 5: Log what went where

Every sync should produce a per-row outcome: created, updated, skipped, failed. When a go-to-market person asks "why isn't this lead showing up", the log answers the question without a developer.

Tips & Best Practices

Don't push sensitive fields unless strictly required. PII in a dozen SaaS tools is an audit and breach multiplier.
Respect destination-side edits. Sales reps will edit fields. Decide: warehouse wins, destination wins, or last-write wins — per field.
Version syncs via dbt tags or config. Breaking changes to a model should trigger a deliberate cutover.
Track cost per destination. Reverse ETL is sneakily expensive; dashboards help.
Keep a kill switch. A bad sync pushing wrong data to CRM needs a one-click stop.

Related patterns

integrationFeatured

API Polling with Change Detection

Periodically check an external system for changes and trigger workflows when new or updated records are detected. The reliable alternative when webhooks aren't available.

Intermediate

integration

Change Data Capture Stream

Stream row-level changes out of a database in near real-time using the transaction log. No polling, no app changes — downstream systems get inserts, updates, and deletes as they happen.

Advanced

integration

Data Sync Bridge

Keep records synchronized between two or more systems. Handles create/update/delete propagation with conflict resolution. Essential for multi-system environments.

Advanced