Skip to main content
Version: Next

PostgreSQL Logical Replication (Native CDC)

Stream every INSERT, UPDATE, and DELETE from a PostgreSQL table directly into a Spice-accelerated dataset over Postgres' native logical replication protocol.

This is the recommended way to keep a Spice accelerator (DuckDB, SQLite, PostgreSQL, Cayenne, Arrow) continuously in sync with a PostgreSQL source.

How it works​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   WAL (pgoutput)   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   ChangeBatch   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PostgreSQL │──────────────────▢│ Spice runtime │────────────────▢│ Accelerator β”‚
β”‚ wal_level= β”‚ replication β”‚ (postgres β”‚ (INSERT/ β”‚ DuckDB / β”‚
β”‚ logical β”‚ slot β”‚ connector) β”‚ UPDATE / β”‚ SQLite / β”‚
β”‚ β”‚ β”‚ β”‚ DELETE) β”‚ Postgres / β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Cayenne β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

On first start the connector:

  1. Creates a publication (default name spice_<dataset>_<hash>_pub) containing the source table.
  2. Creates a replication slot (default spice_<dataset>_<dataset-hash>_<instance-hash>). The <instance-hash> gives each Spice replica its own slot.
  3. Runs a REPEATABLE READ snapshot of the source table so the accelerator starts with all existing rows.
  4. Starts streaming WAL changes from the slot. Each committed transaction is delivered as a ChangeBatch (grouped INSERT/UPDATE/DELETE) and applied to the accelerator.

On subsequent restarts the connector detects the existing slot and resumes from Postgres' stored confirmed_flush_lsn.

Prerequisites​

1. Enable logical replication on the source Postgres​

This requires a server restart.

# postgresql.conf
wal_level = logical
max_replication_slots = 10 # at least one per Spice replica per dataset
max_wal_senders = 10

Verify:

SHOW wal_level;        -- must be 'logical'
SHOW max_replication_slots;

On managed Postgres services:

ServiceHow to enable
AWS RDSSet rds.logical_replication = 1 in the parameter group and restart.
Aurora PostgreSQLSet rds.logical_replication = 1; wait for DB reboot.
GCP Cloud SQLFlag: cloudsql.logical_decoding = on.
Azure DatabaseUnder Replication, set Replication support to LOGICAL.
Supabase / NeonLogical replication is enabled by default.

2. The source table must have a replica identity​

Spice needs the primary key columns in every UPDATE/DELETE event, so one of the following must be true:

  • The table has a primary key (default β€” nothing to do).

  • Or the table has REPLICA IDENTITY FULL:

    ALTER TABLE public.users REPLICA IDENTITY FULL;

Tables with REPLICA IDENTITY NOTHING are rejected at startup.

3. The Postgres role needs these privileges​

GRANT CONNECT ON DATABASE mydb TO spice;
GRANT USAGE ON SCHEMA public TO spice;
GRANT SELECT ON public.users TO spice;
ALTER ROLE spice WITH REPLICATION; -- or be a superuser
-- If you let Spice create the publication (default):
GRANT CREATE ON DATABASE mydb TO spice;

Minimal configuration​

datasets:
- from: postgres:public.users
name: users
params:
pg_host: pg.internal
pg_port: '5432'
pg_user: spice
pg_pass: ${secrets:pg_pass}
pg_db: myapp
pg_sslmode: verify-full # or: disable | prefer | require | verify-ca
pg_sslrootcert: /etc/ssl/pg-ca.pem # optional; omit to use system root CAs
acceleration:
enabled: true
engine: duckdb # or: sqlite | postgres | cayenne | arrow
refresh_mode: changes # <-- triggers WAL streaming
primary_key: id
on_conflict:
id: upsert # required for UPDATE to become an upsert

Start the runtime. Spice will:

  • Auto-create publication spice_users_<dataset-hash>_pub.
  • Auto-create replication slot spice_users_<dataset-hash>_<instance-hash>.
  • Snapshot public.users into the DuckDB accelerator.
  • Stream every subsequent change as it commits on Postgres.
Use a persistent accelerator

Pair with mode: file on DuckDB/SQLite (or the PostgreSQL accelerator) so restarts resume from the last acknowledged LSN instead of re-snapshotting.

Full configuration reference​

All replication-specific parameters live under params: on the dataset and start with pg_:

ParameterDefaultDescription
pg_replication_slotspice_<dataset>_<dataset-hash>_<instance-hash>Name of the replication slot. Must be unique per replica.
pg_publicationspice_<dataset>_<dataset-hash>_pubPublication name. Shared across replicas. Auto-created if missing.
pg_replication_initial_snapshottrueIf true, take an initial snapshot of the table's existing rows before streaming. Set to false if you are pre-seeding the accelerator yourself.
pg_replication_temporary_slotfalseIf true, the slot is dropped when Spice disconnects. Every restart re-bootstraps.
pg_replication_status_interval10sHow often StandbyStatusUpdate (LSN acknowledgement) is sent back to Postgres. Lower values free WAL faster; higher values reduce network chatter. Accepts any duration string (500ms, 30s, 2m).

All existing pg_host, pg_port, pg_user, pg_pass, pg_db, pg_sslmode, pg_connection_string parameters continue to apply β€” see the PostgreSQL Data Connector reference.

pg_sslmode for WAL streaming​

verify-full is the recommended production default.

pg_sslmodeReplication transportCert chain verifiedHostname verified
disableplaintextβ€”β€”
prefer (default)plaintextβ€”β€”
requireTLS❌❌
verify-caTLSβœ…βŒ
verify-fullTLSβœ…βœ…
info

prefer behaves as plaintext on the replication transport because the replication client does not expose a safe "try TLS, fall back to plaintext" path. Set require, verify-ca, or verify-full to force TLS on the WAL stream.

Accelerator engines​

EngineINSERTUPDATEDELETENotes
duckdbβœ…βœ… (upsert)βœ…Recommended for most workloads.
sqliteβœ…βœ… (upsert)βœ…Great for small/medium datasets.
postgresβœ…βœ… (upsert)βœ…Use when the accelerator is another Postgres.
cayenneβœ…βœ… (upsert)βœ…S3-backed Vortex format, good for read-heavy analytics.
arrowβœ…βŒ (becomes insert)βœ…Arrow's in-memory engine does not support on_conflict; UPDATEs append rows.

For workloads that need true upsert semantics, use DuckDB, SQLite, PostgreSQL, or Cayenne.

Multi-replica deployments​

Every Spice replica must have its own replication slot. Spice hashes the replica's identity into the default slot name:

SourceUsed for
SPICE_INSTANCE_ID envPreferred β€” set it explicitly per replica.
HOSTNAME / COMPUTERNAME envFallback β€” works on Kubernetes where each pod has a distinct hostname.

Example: Kubernetes StatefulSet​

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: spice
spec:
replicas: 3
serviceName: spice
template:
spec:
containers:
- name: spice
env:
- name: SPICE_INSTANCE_ID
valueFrom:
fieldRef:
fieldPath: metadata.name # spice-0, spice-1, spice-2

Example: explicit slot names​

# Replica A
params:
pg_replication_slot: spice_users_a

# Replica B
params:
pg_replication_slot: spice_users_b

Each Spice replica can use a different pg_replication_slot while sharing a publication (pg_publication).

Operations​

Monitoring replication lag​

SELECT
slot_name,
active,
confirmed_flush_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS lag_bytes
FROM pg_replication_slots
WHERE slot_name LIKE 'spice_%';

Decommissioning a replica​

Drop unused slots

A permanent replication slot holds on to WAL until dropped. If you retire a Spice replica without cleaning up its slot, Postgres will keep accumulating WAL indefinitely and can run out of disk.

After removing a Spice replica, drop its slot:

SELECT pg_drop_replication_slot('spice_users_<old-instance-hash>');

Rebuilding an accelerator from scratch​

Delete the accelerator's local storage (DuckDB file, SQLite file, etc.) and drop the replication slot. On next start, Spice will create a fresh slot, snapshot the table, and resume streaming.

Resilience​

  • Network blips / Postgres restarts: transient β€” retried with exponential backoff (500 ms β†’ 30 s, Β±20 % jitter). The slot's server-side state is the source of truth, so reconnects resume from the last acknowledged LSN β€” no data loss.
  • Auth failures, missing slot, schema mismatch: fatal β€” surfaced as a stream-level error so operators can fix the configuration.
  • Watch dataset_postgres_replication_reconnects_total to detect flaky networks.

Metrics​

Spice emits OpenTelemetry metrics for every replicated Postgres dataset. Metric names follow the pattern dataset_postgres_replication_<metric> with a name=<dataset> attribute.

Core freshness signals (auto-registered):

MetricTypeDescription
dataset_postgres_replication_lag_msGaugenow() βˆ’ commit_time(latest ingested txn). Primary CDC freshness signal.
dataset_postgres_replication_lag_bytesGaugeserver_wal_end_lsn βˆ’ confirmed_flush_lsn. Unacknowledged WAL held by Spice's slot.
dataset_postgres_replication_transactions_totalCounterCommitted transactions applied.
dataset_postgres_replication_inserts_total / updates_total / deletes_totalCounterRow-level events from WAL.
dataset_postgres_replication_reconnects_totalCounterNumber of times the stream reconnected after a transient failure.

Troubleshooting​

SymptomCause and fix
Error: Table public.X has REPLICA IDENTITY NOTHINGRun ALTER TABLE public.X REPLICA IDENTITY FULL; (or add a primary key).
Error: replication slot "..." already exists on startupAnother Spice replica is using the same slot name. Set pg_replication_slot uniquely, or ensure SPICE_INSTANCE_ID differs.
Error mentioning permission denied for database during setupThe role needs CREATE on the database, or pre-create the publication/slot yourself.
pg_replication_slots.active is true but the accelerator isn't updatingCheck Spice logs for schema-mismatch errors. The replication task holds the slot even after failure β€” restart after fixing.
wal on the source disk growing foreverAn abandoned slot. Drop it with pg_drop_replication_slot.
UPDATEs on Arrow-engine dataset don't replace rowsArrow does not support on_conflict. Switch to duckdb, sqlite, postgres, or cayenne.
Huge TEXT/JSONB columns show as NULL after UPDATEUnchanged TOASTed columns are omitted by pgoutput. Run ALTER TABLE ... REPLICA IDENTITY FULL; if you need them in every event.

Limitations​

  • One table per dataset. Each Spice dataset replicates exactly one source table; each dataset gets its own slot and publication.
  • No DDL replication. Schema changes on the source are not propagated automatically. Add new columns as nullable on the source first, update the Spice dataset, then reload the Spicepod.
  • Arrow engine does not support on_conflict (upsert) semantics. UPDATEs therefore appear as additional inserts rather than replacing existing rows. For true upsert behavior use DuckDB, SQLite, Postgres, or Cayenne.

Comparison with Debezium + Kafka​

AspectDebezium + KafkaNative WAL streaming (this feature)
External servicesKafka + Schema Registry + Debezium + ConnectNone β€” Spice connects to Postgres directly
Deployment footprintJVM stack + ZooKeeper/KRaftZero extra pods
Setup complexityMultiple topics, connector configs, ACLsOne connector config
Operational modelConsumer groups, topic retentionOne replication slot per replica
Schema registry requiredYes (Avro/Protobuf)No β€” schema derived from Postgres catalog
LatencyKafka-bound (~100 ms+)Commit-driven, typically <100 ms

For greenfield Postgres β†’ Spice CDC, prefer native WAL streaming. If Kafka is already deployed for other reasons, the Debezium path continues to work.

See also​