Debezium (CDC over Kafka)
Consume Debezium change events from a Kafka topic and apply them to a Spice-accelerated dataset.
Use Debezium when:
- You already operate Debezium + Kafka for change data capture; or
- The source database does not have a native Spice CDC path (e.g. MySQL, SQL Server, Oracle).
For sources with a native CDC path, prefer the dedicated connector β PostgreSQL Logical Replication, DynamoDB Streams, or MongoDB Change Streams β to avoid the extra Kafka + Debezium hop.
How it worksβ
ββββββββββββββββββ Debezium connector βββββββββββββ Spice consumes βββββββββββββββββββββ ChangeBatch βββββββββββββββββ
β Source DB β βββββββββββββββββββββΆβ Kafka β βββββββββββββββββββββΆβ Spice runtime ββββββββββββββββββββΆβ Accelerator β
β (MySQL, β WAL β JSON events β topic β one consumer group β (debezium β (INSERT/ β DuckDB / β
β SQL Server, β β β per Spice replica β connector) β UPDATE / β SQLite / β
β Oracle, β¦) β β β β β DELETE) β Postgres β
ββββββββββββββββββ βββββββββββββ βββββββββββββββββββββ βββββββββββββββββ
On startup, Spice subscribes to the configured Debezium-managed Kafka topic using either a uniquely generated consumer group or one specified via kafka_consumer_group_id. With a persistent acceleration engine (mode: file), data is fetched starting from the last committed offset, so restarts resume without reprocessing historical events.
Prerequisitesβ
- A running Debezium connector publishing change events to a Kafka topic for the source table.
- A reachable Kafka cluster (one or more
bootstrap.servers). - A Spice acceleration engine that supports CDC:
duckdb,sqlite, orpostgres.
Minimal configurationβ
datasets:
- from: debezium:my_kafka_topic_with_debezium_changes
name: customer_addresses
params:
debezium_transport: kafka # Optional. Only `kafka` is currently supported.
debezium_message_format: json # Optional. Only `json` is currently supported.
kafka_bootstrap_servers: localhost:9092
kafka_security_protocol: PLAINTEXT
acceleration:
enabled: true # Required.
engine: duckdb # duckdb / sqlite / postgres
mode: file # Persist Kafka offsets so restarts resume.
refresh_mode: changes # Required.
The from field takes the form debezium:<kafka_topic>. The topic must contain Debezium-formatted change events for a single source table.
SASL/SSL authenticationβ
For Kafka clusters with SASL/SSL enabled:
datasets:
- from: debezium:my_kafka_topic_with_debezium_changes
name: orders
params:
kafka_bootstrap_servers: broker1:9092,broker2:9092,broker3:9092
kafka_security_protocol: sasl_ssl # Default
kafka_sasl_mechanism: SCRAM-SHA-512 # PLAIN / SCRAM-SHA-256 / SCRAM-SHA-512
kafka_sasl_username: kafka_user
kafka_sasl_password: ${secrets:kafka_sasl_password}
kafka_ssl_ca_location: ./certs/kafka_ca_cert.pem
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_mode: changes
The full set of kafka_* parameters is documented in the Debezium connector reference.
Consumer-group managementβ
The connector manages Kafka consumer groups so offsets persist across restarts:
- Default β Spice auto-generates a unique consumer group ID, stores it in the acceleration metadata, and reuses it on subsequent restarts.
- Custom β Pass
kafka_consumer_group_idto use your own group ID. The same ID must be used on every restart; if Spice detects a mismatch against the stored ID, it returns an error to prevent data inconsistency.
To recover from a deliberate consumer-group change, reset the acceleration data so Spice starts fresh.
See the full description in the Debezium connector reference.
Schema evolutionβ
Debezium emits change events whose schema may evolve as the upstream table is altered. Set schema_evolution: true to have Spice peek at the latest Kafka message on reload and detect schema changes:
params:
schema_evolution: true # Default: false
Batchingβ
Two parameters control how many events Spice groups into a single CDC batch before applying it to the accelerator:
| Parameter | Default | Description |
|---|---|---|
batch_max_size | 10000 | Max number of change events to batch together before processing. |
batch_max_duration | 1s | Max time to wait for a batch to fill before processing. |
Larger batches improve throughput at the cost of higher per-batch latency.
Metricsβ
The connector exposes the following component metrics:
| Metric Name | Type | Description |
|---|---|---|
bytes_consumed_total | Counter | Total number of bytes consumed from the Kafka topic |
records_consumed_total | Counter | Total number of records (messages) consumed from Kafka topics |
records_lag | Gauge | Total consumer lag across all topic partitions (number of messages not yet consumed) |
These metrics are opt-in; see the Debezium connector reference for an example metrics: block.
Limitationsβ
- Only
kafkais supported as the Debezium transport. - Only
jsonis supported as the message format. - Acceleration is required β Debezium cannot be used as a federated, non-accelerated dataset.
See alsoβ
- Debezium Data Connector β complete parameter reference.
- Streaming changes in real-time with Debezium CDC β cookbook recipe.
- Streaming changes with Debezium and SASL/SCRAM authentication β authenticated-Kafka recipe.
refresh_mode: changesβ refresh-mode reference.
