MongoDB Change Streams (Native CDC)
Stream every insert, update, replace, and delete from a MongoDB collection directly into a Spice-accelerated dataset using native MongoDB Change Streams.
This is the recommended way to keep a Spice accelerator (DuckDB, SQLite, PostgreSQL, Turso, Cayenne) continuously in sync with a MongoDB source β no Kafka, no Debezium, no external services.
How it worksβ
ββββββββββββββββββββ Change Streams βββββββββββββββββββββ ChangeBatch βββββββββββββββββ
β MongoDB β ββββββββββββββββββββββββΆβ Spice runtime ββββββββββββββββββββΆβ Accelerator β
β replica set / β fullDocument= β (mongodb β (INSERT/ β DuckDB / β
β sharded β updateLookup β connector) β UPDATE / β SQLite / β
β cluster β + resume tokens β β DELETE) β Postgres / β
ββββββββββββββββββββ βββββββββββββββββββββ β Turso / β
β Cayenne β
βββββββββββββββββ
On first start the connector:
- Opens a Change Stream on the source collection with
fullDocument=updateLookup. - Emits a CDC
TRUNCATEand applies a full snapshot of the collection as upsert rows. - Signals readiness, then processes Change Stream events in batches.
Opening the Change Stream before the snapshot prevents gaps between the snapshot and the live stream.
For file-backed accelerators (acceleration mode: file / file_create / file_update, or engine: postgres), Spice persists the most recent Change Stream resume token in a sidecar table named spice_sys_mongodb alongside the accelerator data. The token is committed only after the downstream accelerator write succeeds (at-least-once semantics). On restart, Spice resumes from the persisted token and skips the snapshot.
In-memory accelerators do not persist a resume token; restarts re-bootstrap from a fresh snapshot.
Prerequisitesβ
- MongoDB 4.0+ with Change Streams enabled. MongoDB requires a replica set or sharded cluster β single-node
mongoddeployments do not support Change Streams. - The MongoDB user must have the
changeStreamprivilege on the source collection. - The accelerator must support upsert behavior β use
duckdb,sqlite,postgres,turso, orcayenne. acceleration.primary_key: _idis required. Delete events only include the document key, so Spice needs_idto route deletes.acceleration.on_conflictmust specifyupserton_idso update and replace events overwrite existing rows.
Minimal configurationβ
datasets:
- from: mongodb:users
name: users
params:
mongodb_host: localhost
mongodb_port: '27017'
mongodb_db: my_database
mongodb_user: my_user
mongodb_pass: ${secrets:mongodb_pass}
acceleration:
enabled: true
engine: duckdb
mode: file # Persist resume tokens so restarts skip the snapshot
refresh_mode: changes
primary_key: _id
on_conflict:
_id: upsert
Tuningβ
These optional runtime parameters live under dataset params:. Defaults are reasonable; tune only when you have a specific batching or oplog-window concern.
| Parameter Name | Default | Description |
|---|---|---|
change_stream_batch_max_size | 1000 | Max number of Change Stream events to group into one CDC batch before applying it. |
change_stream_batch_max_duration | 1s | Max time to wait for a Change Stream batch to fill before applying it. Accepts fundu duration strings. |
change_stream_max_await_time | 1s | Max time MongoDB waits for new events before returning an empty server batch. Accepts fundu duration strings. |
change_stream_batch_size | 1000 | Number of Change Stream events MongoDB should request from the server per batch. |
mongodb_resume_token_invalid_behavior | error | Behavior when a persisted resume token is rejected (e.g. past the oplog window). error surfaces the failure; rebootstrap drops the token and re-snapshots. |
The existing mongodb_unnest_depth parameter applies to Change Stream documents too, so nested BSON is flattened the same way as normal MongoDB reads.
Event mappingβ
| MongoDB event | Applied as | Notes |
|---|---|---|
insert | create / upsert | Uses fullDocument. |
update | update / upsert | Uses fullDocument from fullDocument=updateLookup. |
replace | update / upsert | Uses fullDocument. |
delete | delete | Uses documentKey; non-key columns are null. |
drop, rename, dropDatabase, invalidate | truncate | Collection continuity is no longer guaranteed; the accelerator is reset and re-bootstrapped. |
If MongoDB does not include fullDocument for an update or replace event, Spice fails the stream with a clear error instead of applying a partial row.
Resumability across restartsβ
For file-accelerated datasets, the persisted resume token lets Spice resume from where it left off without re-snapshotting. When MongoDB rejects the token (typical codes ChangeStreamHistoryLost 286 or ChangeStreamFatalError 280 β usually when the oplog window has rolled past the token's position), the behavior is governed by mongodb_resume_token_invalid_behavior:
error(default) β Spice surfaces a clear error and stops; the operator decides what to do.rebootstrapβ Spice drops the persisted token and re-snapshots the collection.
Re-snapshotting a large collection is opt-in by default to prevent silent expensive rebootstraps.
Limitationsβ
- Change Streams require a replica set or sharded cluster β they do not work against a single-node
mongod. refresh_sqlis not supported with Change Streams.- In-memory accelerators do not persist resume tokens; every restart re-snapshots.
See alsoβ
- MongoDB Data Connector β complete parameter reference and connection options.
refresh_mode: changesβ refresh-mode reference.
