Change Data Capture Stream
Stream row-level changes out of a database in near real-time using the transaction log. No polling, no app changes — downstream systems get inserts, updates, and deletes as they happen.
On this page
Visual Flow
Rendering diagram…
When to Use This Pattern
Use CDC when you need to stream changes from a database to other systems without polling or modifying the application:
- Replicating OLTP data to a warehouse for analytics
- Keeping a search index (Elasticsearch, OpenSearch) in sync
- Invalidating caches when source data changes
- Building event streams from legacy systems you can't modify
CDC shines when: the source DB can't be changed, latency matters (sub-second to few-second), and you need every change, not a snapshot.
How It Works
Databases write every change to a transaction log before applying it. CDC tools (Debezium, AWS DMS, Fivetran HVR, native CDC in SQL Server / Postgres logical replication) tail that log and emit a stream of change events, each describing:
- The table and primary key
- The change type: insert, update, or delete
- The row's before and after state
The stream is typically published to Kafka or a similar event bus. Downstream consumers subscribe and apply changes to their own stores.
CDC captures every change including sensitive ones. If your CDC stream flows to a less secure environment, you've just leaked PII. Plan encryption, masking, and access control before you connect anything.
Implementation Guide
Step 1: Enable the log feature on the source
Postgres: set wal_level = logical and create a publication. SQL Server: enable CDC on the database and the tables. MySQL: enable binlog_format = ROW. Each has knock-on effects on disk usage and backup — read the docs.
Step 2: Handle the initial snapshot
The log only has changes. Downstream consumers need the current state too. Most CDC tools do a one-time snapshot first, then switch to tailing the log. Plan for this window — the snapshot of a 500GB table takes time.
Step 3: Protect against schema changes
A column added upstream needs to propagate. A column dropped could break consumers. Decide: hard-fail loudly on schema drift, or route old-schema events through a compatibility layer?
Step 4: Guarantee ordering where it matters
Most CDC tools preserve per-row ordering (all events for user-42 arrive in order). Cross-row ordering is usually not preserved. If you need causal consistency across tables, design for it explicitly.
Step 5: Monitor replication lag
Lag is the most important metric. A 30-second lag is healthy; a 30-minute lag means your search index is 30 minutes stale. Alert on sustained lag over a threshold.
Tips & Best Practices
- Test with real schema changes in staging. CDC breaking in prod at 2am is the worst.
- Never rely on CDC for exact balance reconciliation. Use it for speed; reconcile with a nightly batch.
- Keep retention high enough to replay. If a consumer is down for 4 hours, can you replay 4 hours of changes? If retention is 1 hour, you're reseeding from snapshot.
- Use CDC for data movement, not business logic. Consumers that react to CDC events and then write business rules start to feel like distributed transactions gone wrong.
- Budget for storage carefully. The transaction log grows when a consumer falls behind.
Related patterns
API Polling with Change Detection
Periodically check an external system for changes and trigger workflows when new or updated records are detected. The reliable alternative when webhooks aren't available.
Reverse ETL
Push modelled data from your warehouse back into the SaaS tools that business teams use every day — CRM, marketing, support — so they can act on analytics without a BI detour.
Data Sync Bridge
Keep records synchronized between two or more systems. Handles create/update/delete propagation with conflict resolution. Essential for multi-system environments.