Change Data Capture (CDC)
Change Data Capture (CDC) captures insert, update, and delete events from a database's transaction log and delivers them to consumers with low latency. This technique enables Spice to keep locally accelerated datasets synchronized with the source data in near real-time. CDC is efficient because it transfers only changed rows instead of re-fetching the entire dataset.
Benefits​
Using locally accelerated datasets configured with CDC enables Spice to provide high-performance accelerated queries and efficient real-time updates.
Example Use Case​
Consider a fraud detection application that needs to determine whether a pending transaction is likely fraudulent. The application queries a Spice-accelerated, real-time updated table of recent transactions to check if a pending transaction resembles known fraudulent ones. With CDC, the table is kept up-to-date, so the application can quickly identify potential fraud.
Considerations​
When configuring datasets to be accelerated with CDC, ensure that the data connector supports CDC and can return a stream of row-level changes. See the Supported Data Connectors section for more information.
The startup time for CDC-accelerated datasets may be longer than for non-CDC-accelerated datasets due to the initial synchronization.
It is recommended to use CDC-accelerated datasets with persistent data accelerator configurations (i.e., file mode for DuckDB/SQLite or PostgreSQL). This ensures that when Spice restarts, it can resume from the last known state of the dataset instead of re-fetching the entire dataset.
Tuning ingestion​
Spice applies CDC events through a single apply loop that coalesces a contiguous run of buffered change events ("envelopes") into one accelerator write. The coalescing behavior is controlled by the following instance-wide runtime.params (set once under the top-level runtime.params, not per-dataset). Each parameter also accepts a SPICE_-prefixed environment variable; the runtime.params value takes precedence, falling back to the environment variable, then the default.
| Parameter | Description | Default |
|---|---|---|
cdc_prefetch_buffer | Number of source change events buffered ahead of the apply loop. Range 1–16384. | 128 |
cdc_max_coalesced_envelopes | Maximum number of change events combined into a single accelerator write. Range 1–16384. | 256 |
cdc_max_coalesced_bytes | Maximum in-memory Arrow size (bytes) of a single coalesced write. Range 1–1073741824 (1 GiB). | 134217728 (128 MiB) |
cdc_max_coalesce_age_ms | Apply-loop linger window in milliseconds. When > 0, the loop keeps accumulating change events into one write until the envelope cap, the byte budget, or this window elapses — whichever comes first. The window is measured from the start of the previous apply. 0 disables lingering, so each buffered event is applied as soon as it arrives. | 0 (no linger) |
cdc_commit_timeout_ms | Maximum time to wait for the previous source-side commit before surfacing ingestion as stalled. Range 1–3600000 (1 hour). | 30000 (30s) |
runtime:
params:
cdc_max_coalesce_age_ms: 250 # linger up to 250ms to coalesce slowly-arriving events into fewer writes
Out-of-range or unparseable values are rejected with a warning and fall back to the default.
Supported Data Connectors​
Enabling CDC by setting refresh_mode: changes in the acceleration settings requires support from the data connector to provide a stream of row-level changes.
Spice currently supports streaming ingestion via:
- PostgreSQL Logical Replication — recommended for PostgreSQL sources. Spice connects directly to the source using Postgres' native logical replication protocol (
wal_level=logical+ pgoutput) and streamsINSERT/UPDATE/DELETEevents into the accelerator. No Kafka, no Debezium, no external services. - DynamoDB Streams — for Amazon DynamoDB sources. Spice consumes the table's DynamoDB Streams directly and applies
INSERT/UPDATE/DELETEevents to the accelerator. - MongoDB Change Streams — for MongoDB replica sets and sharded clusters. Spice opens a native Change Stream on the source collection and applies inserts, updates, replaces, and deletes to the accelerator.
- Apache Kafka — for event-streaming topics. Spice consumes records directly with
refresh_mode: appendfor real-time, append-only acceleration (no separate CDC connector required). - Debezium (over Kafka) — for sources where Debezium + Kafka is already deployed, or for databases without a native Spice CDC path (MySQL, SQL Server, etc.).
Example​
See an example of configuring a dataset to use CDC with Debezium by following the recipe at Streaming changes in real-time with Debezium CDC.
version: v1
kind: Spicepod
name: cdc-debezium
datasets:
- from: debezium:cdc.public.customer_addresses
name: cdc
params:
debezium_transport: kafka
debezium_message_format: json
kafka_bootstrap_servers: localhost:19092
kafka_security_protocol: PLAINTEXT
acceleration:
enabled: true
engine: sqlite
mode: file
refresh_mode: changes
