advancedintegration

Change Data Capture Stream

Stream row-level changes out of a database in near real-time using the transaction log. No polling, no app changes — downstream systems get inserts, updates, and deletes as they happen.

BPMN 2.0

On this page

Visual Flow

Rendering diagram…

When to Use This Pattern

Use CDC when you need to stream changes from a database to other systems without polling or modifying the application:

Replicating OLTP data to a warehouse for analytics
Keeping a search index (Elasticsearch, OpenSearch) in sync
Invalidating caches when source data changes
Building event streams from legacy systems you can't modify

CDC shines when: the source DB can't be changed, latency matters (sub-second to few-second), and you need every change, not a snapshot.

How It Works

Databases write every change to a transaction log before applying it. CDC tools (Debezium, AWS DMS, Fivetran HVR, native CDC in SQL Server / Postgres logical replication) tail that log and emit a stream of change events, each describing:

The table and primary key
The change type: insert, update, or delete
The row's before and after state

The stream is typically published to Kafka or a similar event bus. Downstream consumers subscribe and apply changes to their own stores.

Warning

CDC captures every change including sensitive ones. If your CDC stream flows to a less secure environment, you've just leaked PII. Plan encryption, masking, and access control before you connect anything.

Implementation Guide

Step 1: Enable the log feature on the source

Postgres: set wal_level = logical and create a publication. SQL Server: enable CDC on the database and the tables. MySQL: enable binlog_format = ROW. Each has knock-on effects on disk usage and backup — read the docs.

Step 2: Handle the initial snapshot

The log only has changes. Downstream consumers need the current state too. Most CDC tools do a one-time snapshot first, then switch to tailing the log. Plan for this window — the snapshot of a 500GB table takes time.

Step 3: Protect against schema changes

A column added upstream needs to propagate. A column dropped could break consumers. Decide: hard-fail loudly on schema drift, or route old-schema events through a compatibility layer?

Step 4: Guarantee ordering where it matters

Most CDC tools preserve per-row ordering (all events for user-42 arrive in order). Cross-row ordering is usually not preserved. If you need causal consistency across tables, design for it explicitly.

Step 5: Monitor replication lag

Lag is the most important metric. A 30-second lag is healthy; a 30-minute lag means your search index is 30 minutes stale. Alert on sustained lag over a threshold.

Tips & Best Practices

Test with real schema changes in staging. CDC breaking in prod at 2am is the worst.
Never rely on CDC for exact balance reconciliation. Use it for speed; reconcile with a nightly batch.
Keep retention high enough to replay. If a consumer is down for 4 hours, can you replay 4 hours of changes? If retention is 1 hour, you're reseeding from snapshot.
Use CDC for data movement, not business logic. Consumers that react to CDC events and then write business rules start to feel like distributed transactions gone wrong.
Budget for storage carefully. The transaction log grows when a consumer falls behind.

Related patterns

integrationFeatured

API Polling with Change Detection

Periodically check an external system for changes and trigger workflows when new or updated records are detected. The reliable alternative when webhooks aren't available.

Intermediate

integration

Reverse ETL

Push modelled data from your warehouse back into the SaaS tools that business teams use every day — CRM, marketing, support — so they can act on analytics without a BI detour.

Advanced

integration

Data Sync Bridge

Keep records synchronized between two or more systems. Handles create/update/delete propagation with conflict resolution. Essential for multi-system environments.

Advanced