Skip to main content
Version: Next

PostgreSQL Logical Replication (Native CDC)

Stream every INSERT, UPDATE, and DELETE from a PostgreSQL table directly into a Spice-accelerated dataset over Postgres' native logical replication protocol.

This is the recommended way to keep a Spice accelerator (DuckDB, SQLite, PostgreSQL, Cayenne, Arrow) continuously in sync with a PostgreSQL source.

How it works​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   WAL (pgoutput)   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   ChangeBatch   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PostgreSQL │──────────────────▢│ Spice runtime │────────────────▢│ Accelerator β”‚
β”‚ wal_level= β”‚ replication β”‚ (postgres β”‚ (INSERT/ β”‚ DuckDB / β”‚
β”‚ logical β”‚ slot β”‚ connector) β”‚ UPDATE / β”‚ SQLite / β”‚
β”‚ β”‚ β”‚ β”‚ DELETE) β”‚ Postgres / β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Cayenne β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

On first start the connector:

  1. Creates a publication (default name spice_<dataset>_<hash>_pub) containing the source table.
  2. Creates a replication slot (default spice_<dataset>_<dataset-hash>_<instance-hash>). The <instance-hash> gives each Spice replica its own slot.
  3. Runs a REPEATABLE READ snapshot of the source table so the accelerator starts with all existing rows.
  4. Starts streaming WAL changes from the slot. Each committed transaction is delivered as a ChangeBatch (grouped INSERT/UPDATE/DELETE) and applied to the accelerator.

On subsequent restarts the connector detects the existing slot and resumes from Postgres' stored confirmed_flush_lsn.

Prerequisites​

1. Enable logical replication on the source Postgres​

This requires a server restart.

# postgresql.conf
wal_level = logical
max_replication_slots = 10 # at least one per Spice replica per dataset
max_wal_senders = 10

Verify:

SHOW wal_level;        -- must be 'logical'
SHOW max_replication_slots;

On managed Postgres services:

ServiceHow to enable
AWS RDSSet rds.logical_replication = 1 in the parameter group and restart.
Aurora PostgreSQLSet rds.logical_replication = 1; wait for DB reboot.
GCP Cloud SQLFlag: cloudsql.logical_decoding = on.
Azure DatabaseUnder Replication, set Replication support to LOGICAL.
Supabase / NeonLogical replication is enabled by default.

2. The source table must have a replica identity​

Spice needs the primary key columns in every UPDATE/DELETE event, so one of the following must be true:

  • The table has a primary key (default β€” nothing to do).

  • Or the table has REPLICA IDENTITY FULL:

    ALTER TABLE public.users REPLICA IDENTITY FULL;

Tables with REPLICA IDENTITY NOTHING are rejected at startup.

3. The Postgres role needs these privileges​

GRANT CONNECT ON DATABASE mydb TO spice;
GRANT USAGE ON SCHEMA public TO spice;
GRANT SELECT ON public.users TO spice;
ALTER ROLE spice WITH REPLICATION; -- or be a superuser
-- If you let Spice create the publication (default):
GRANT CREATE ON DATABASE mydb TO spice;

Minimal configuration​

datasets:
- from: postgres:public.users
name: users
params:
pg_host: pg.internal
pg_port: '5432'
pg_user: spice
pg_pass: ${secrets:pg_pass}
pg_db: myapp
pg_sslmode: verify-full # or: disable | prefer | require | verify-ca
pg_sslrootcert: /etc/ssl/pg-ca.pem # optional; omit to use system root CAs
acceleration:
enabled: true
engine: duckdb # or: sqlite | postgres | cayenne | arrow
refresh_mode: changes # <-- triggers WAL streaming
primary_key: id
on_conflict:
id: upsert # required for UPDATE to become an upsert

Start the runtime. Spice will:

  • Auto-create publication spice_users_<dataset-hash>_pub.
  • Auto-create replication slot spice_users_<dataset-hash>_<instance-hash>.
  • Snapshot public.users into the DuckDB accelerator.
  • Stream every subsequent change as it commits on Postgres.
Use a persistent accelerator

Pair with mode: file on DuckDB/SQLite (or the PostgreSQL accelerator) so restarts resume from the last acknowledged LSN instead of re-snapshotting.

Full configuration reference​

All replication-specific parameters live under params: on the dataset and start with pg_:

ParameterDefaultDescription
pg_replication_slotspice_<dataset>_<dataset-hash>_<instance-hash>Name of the replication slot. Must be unique per replica.
pg_publicationspice_<dataset>_<dataset-hash>_pubPublication name. Shared across replicas. Auto-created if missing.
pg_replication_initial_snapshottrueIf true, take an initial snapshot of the table's existing rows before streaming. Set to false if you are pre-seeding the accelerator yourself.
pg_replication_temporary_slotfalseIf true, the slot is dropped when Spice disconnects. Every restart re-bootstraps.
pg_replication_status_interval10sHow often StandbyStatusUpdate (LSN acknowledgement) is sent back to Postgres. Lower values free WAL faster; higher values reduce network chatter. Accepts any duration string (500ms, 30s, 2m).
pg_replication_bootstrap_batch_size8192Rows per emitted batch during the initial-snapshot bootstrap. Larger batches reduce per-batch overhead at the cost of more memory per batch. Maximum: 1048576.

All existing pg_host, pg_port, pg_user, pg_pass, pg_db, pg_sslmode, pg_connection_string parameters continue to apply β€” see the PostgreSQL Data Connector reference.

pg_sslmode for WAL streaming​

verify-full is the recommended production default.

pg_sslmodeReplication transportCert chain verifiedHostname verified
disableplaintextβ€”β€”
prefer (default)plaintextβ€”β€”
requireTLS❌❌
verify-caTLSβœ…βŒ
verify-fullTLSβœ…βœ…
info

prefer behaves as plaintext on the replication transport because the replication client does not expose a safe "try TLS, fall back to plaintext" path. Set require, verify-ca, or verify-full to force TLS on the WAL stream.

Accelerator engines​

EngineINSERTUPDATEDELETENotes
duckdbβœ…βœ… (upsert)βœ…Recommended for most workloads.
sqliteβœ…βœ… (upsert)βœ…Great for small/medium datasets.
postgresβœ…βœ… (upsert)βœ…Use when the accelerator is another Postgres.
cayenneβœ…βœ… (upsert)βœ…S3-backed Vortex format, good for read-heavy analytics.
arrowβœ…βœ… (upsert with primary key)βœ…Arrow's in-memory engine uses a hash index for primary-key upserts. Without a primary key, UPDATEs are appended as new rows. DELETE and TRUNCATE are applied via Arrow's DeletionTableProvider.

For Arrow workloads that need true upsert semantics (so UPDATEs replace existing rows instead of duplicating them), configure a primary_key. DuckDB, SQLite, PostgreSQL, and Cayenne also support upsert behavior.

Multi-replica deployments​

Every Spice replica must have its own replication slot. Spice hashes the replica's identity into the default slot name:

SourceUsed for
SPICE_INSTANCE_ID envPreferred β€” set it explicitly per replica.
HOSTNAME / COMPUTERNAME envFallback β€” works on Kubernetes where each pod has a distinct hostname.

Example: Kubernetes StatefulSet​

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: spice
spec:
replicas: 3
serviceName: spice
template:
spec:
containers:
- name: spice
env:
- name: SPICE_INSTANCE_ID
valueFrom:
fieldRef:
fieldPath: metadata.name # spice-0, spice-1, spice-2

Example: explicit slot names​

# Replica A
params:
pg_replication_slot: spice_users_a

# Replica B
params:
pg_replication_slot: spice_users_b

Each Spice replica can use a different pg_replication_slot while sharing a publication (pg_publication).

Operations​

Monitoring replication lag​

SELECT
slot_name,
active,
confirmed_flush_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS lag_bytes
FROM pg_replication_slots
WHERE slot_name LIKE 'spice_%';

Decommissioning a replica​

Drop unused slots

A permanent replication slot holds on to WAL until dropped. If you retire a Spice replica without cleaning up its slot, Postgres will keep accumulating WAL indefinitely and can run out of disk.

After removing a Spice replica, drop its slot:

SELECT pg_drop_replication_slot('spice_users_<old-instance-hash>');

Rebuilding an accelerator from scratch​

Delete the accelerator's local storage (DuckDB file, SQLite file, etc.) and drop the replication slot. On next start, Spice will create a fresh slot, snapshot the table, and resume streaming.

Resilience​

  • Network blips / Postgres restarts: transient β€” retried with exponential backoff (500 ms β†’ 30 s, Β±20 % jitter). The slot's server-side state is the source of truth, so reconnects resume from the last acknowledged LSN β€” no data loss.
  • Auth failures, missing slot, schema mismatch: fatal β€” surfaced as a stream-level error so operators can fix the configuration.
  • Watch dataset_postgres_replication_reconnects_total to detect flaky networks.

Metrics​

Spice emits OpenTelemetry metrics for every replicated Postgres dataset. Metric names follow the pattern dataset_postgres_replication_<metric> with a name=<dataset> attribute.

Core freshness signals (auto-registered):

MetricTypeDescription
dataset_postgres_replication_lag_msGaugenow() βˆ’ commit_time(latest ingested txn). Primary CDC freshness signal.
dataset_postgres_replication_lag_bytesGaugeserver_wal_end_lsn βˆ’ confirmed_flush_lsn. Unacknowledged WAL held by Spice's slot.
dataset_postgres_replication_transactions_totalCounterCommitted transactions applied.
dataset_postgres_replication_inserts_total / updates_total / deletes_totalCounterRow-level events from WAL.
dataset_postgres_replication_reconnects_totalCounterNumber of times the stream reconnected after a transient failure.

Troubleshooting​

SymptomCause and fix
Error: Table public.X has REPLICA IDENTITY NOTHINGRun ALTER TABLE public.X REPLICA IDENTITY FULL; (or add a primary key).
Error: replication slot "..." already exists on startupAnother Spice replica is using the same slot name. Set pg_replication_slot uniquely, or ensure SPICE_INSTANCE_ID differs.
Error mentioning permission denied for database during setupThe role needs CREATE on the database, or pre-create the publication/slot yourself.
pg_replication_slots.active is true but the accelerator isn't updatingCheck Spice logs for schema-mismatch errors. The replication task holds the slot even after failure β€” restart after fixing.
wal on the source disk growing foreverAn abandoned slot. Drop it with pg_drop_replication_slot.
UPDATEs on Arrow-engine dataset don't replace rowsConfigure a primary_key so Arrow can use its hash index for upserts, or switch to duckdb, sqlite, postgres, or cayenne.
Huge TEXT/JSONB columns show as NULL after UPDATEUnchanged TOASTed columns are omitted by pgoutput. Run ALTER TABLE ... REPLICA IDENTITY FULL; if you need them in every event.

Limitations​

  • One table per dataset. Each Spice dataset replicates exactly one source table; each dataset gets its own slot and publication.
  • No DDL replication. Schema changes on the source are not propagated automatically. Add new columns as nullable on the source first, update the Spice dataset, then reload the Spicepod.
  • Arrow engine supports on_conflict upserts when a primary_key is configured. Without a primary key, UPDATEs appear as additional inserts rather than replacing existing rows. DELETE and TRUNCATE are applied either way.

Comparison with Debezium + Kafka​

AspectDebezium + KafkaNative WAL streaming (this feature)
External servicesKafka + Schema Registry + Debezium + ConnectNone β€” Spice connects to Postgres directly
Deployment footprintJVM stack + ZooKeeper/KRaftZero extra pods
Setup complexityMultiple topics, connector configs, ACLsOne connector config
Operational modelConsumer groups, topic retentionOne replication slot per replica
Schema registry requiredYes (Avro/Protobuf)No β€” schema derived from Postgres catalog
LatencyKafka-bound (~100 ms+)Commit-driven, typically <100 ms

For greenfield Postgres β†’ Spice CDC, prefer native WAL streaming. If Kafka is already deployed for other reasons, the Debezium path continues to work.

See also​