Skip to main content
Version: Next

Debezium (CDC over Kafka)

Consume Debezium change events from a Kafka topic and apply them to a Spice-accelerated dataset.

Use Debezium when:

  • You already operate Debezium + Kafka for change data capture; or
  • The source database does not have a native Spice CDC path (e.g. MySQL, SQL Server, Oracle).

For sources with a native CDC path, prefer the dedicated connector β€” PostgreSQL Logical Replication, DynamoDB Streams, or MongoDB Change Streams β€” to avoid the extra Kafka + Debezium hop.

How it works​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  Debezium connector   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    Spice consumes      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    ChangeBatch     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Source DB β”‚ ────────────────────▢│ Kafka β”‚ ────────────────────▢│ Spice runtime │──────────────────▢│ Accelerator β”‚
β”‚ (MySQL, β”‚ WAL β†’ JSON events β”‚ topic β”‚ one consumer group β”‚ (debezium β”‚ (INSERT/ β”‚ DuckDB / β”‚
β”‚ SQL Server, β”‚ β”‚ β”‚ per Spice replica β”‚ connector) β”‚ UPDATE / β”‚ SQLite / β”‚
β”‚ Oracle, …) β”‚ β”‚ β”‚ β”‚ β”‚ DELETE) β”‚ Postgres β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

On startup, Spice subscribes to the configured Debezium-managed Kafka topic using either a uniquely generated consumer group or one specified via kafka_consumer_group_id. With a persistent acceleration engine (mode: file), data is fetched starting from the last committed offset, so restarts resume without reprocessing historical events.

Prerequisites​

  • A running Debezium connector publishing change events to a Kafka topic for the source table.
  • A reachable Kafka cluster (one or more bootstrap.servers).
  • A Spice acceleration engine that supports CDC: duckdb, sqlite, or postgres.

Minimal configuration​

datasets:
- from: debezium:my_kafka_topic_with_debezium_changes
name: customer_addresses
params:
debezium_transport: kafka # Optional. Only `kafka` is currently supported.
debezium_message_format: json # Optional. Only `json` is currently supported.
kafka_bootstrap_servers: localhost:9092
kafka_security_protocol: PLAINTEXT
acceleration:
enabled: true # Required.
engine: duckdb # duckdb / sqlite / postgres
mode: file # Persist Kafka offsets so restarts resume.
refresh_mode: changes # Required.

The from field takes the form debezium:<kafka_topic>. The topic must contain Debezium-formatted change events for a single source table.

SASL/SSL authentication​

For Kafka clusters with SASL/SSL enabled:

datasets:
- from: debezium:my_kafka_topic_with_debezium_changes
name: orders
params:
kafka_bootstrap_servers: broker1:9092,broker2:9092,broker3:9092
kafka_security_protocol: sasl_ssl # Default
kafka_sasl_mechanism: SCRAM-SHA-512 # PLAIN / SCRAM-SHA-256 / SCRAM-SHA-512
kafka_sasl_username: kafka_user
kafka_sasl_password: ${secrets:kafka_sasl_password}
kafka_ssl_ca_location: ./certs/kafka_ca_cert.pem
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_mode: changes

The full set of kafka_* parameters is documented in the Debezium connector reference.

Consumer-group management​

The connector manages Kafka consumer groups so offsets persist across restarts:

  • Default β€” Spice auto-generates a unique consumer group ID, stores it in the acceleration metadata, and reuses it on subsequent restarts.
  • Custom β€” Pass kafka_consumer_group_id to use your own group ID. The same ID must be used on every restart; if Spice detects a mismatch against the stored ID, it returns an error to prevent data inconsistency.

To recover from a deliberate consumer-group change, reset the acceleration data so Spice starts fresh.

See the full description in the Debezium connector reference.

Schema evolution​

Debezium emits change events whose schema may evolve as the upstream table is altered. Set schema_evolution: true to have Spice peek at the latest Kafka message on reload and detect schema changes:

params:
schema_evolution: true # Default: false

Batching​

Two parameters control how many events Spice groups into a single CDC batch before applying it to the accelerator:

ParameterDefaultDescription
batch_max_size10000Max number of change events to batch together before processing.
batch_max_duration1sMax time to wait for a batch to fill before processing.

Larger batches improve throughput at the cost of higher per-batch latency.

Metrics​

The connector exposes the following component metrics:

Metric NameTypeDescription
bytes_consumed_totalCounterTotal number of bytes consumed from the Kafka topic
records_consumed_totalCounterTotal number of records (messages) consumed from Kafka topics
records_lagGaugeTotal consumer lag across all topic partitions (number of messages not yet consumed)

These metrics are opt-in; see the Debezium connector reference for an example metrics: block.

Limitations​

  • Only kafka is supported as the Debezium transport.
  • Only json is supported as the message format.
  • Acceleration is required β€” Debezium cannot be used as a federated, non-accelerated dataset.

See also​