Version: Next

DuckDB Data Accelerator Deployment Guide

Production operating guide for the DuckDB data accelerator covering memory vs file mode, checkpointing, spill, and observability.

Authentication & Secrets

DuckDB is an embedded, in-process engine. No external authentication is required. For file-mode, protect the DuckDB database file with filesystem permissions and encrypt at rest (LUKS/dm-crypt, EBS encryption, etc.).

Resilience & Durability

Memory vs File Mode

Mode	Durability	Spill-to-disk	Restart behavior
`memory`	None — lost on restart.	Via configured `temp_directory`.	Full refresh on startup.
`file`	Crash-safe via DuckDB WAL.	Via configured `temp_directory`.	Incremental refresh resumes.

Use mode: file for any dataset larger than a few hundred MB or where restart speed matters.

Checkpointing

The DuckDB accelerator enables PRAGMA enable_checkpoint_on_shutdown at connection setup. Graceful shutdown writes a clean checkpoint, making restart near-instantaneous. Ungraceful shutdowns leave a WAL to replay, slowing the first subsequent startup.

Spill Directory

Large queries (sort, aggregate, join) can spill to disk. The spill directory is controlled by runtime.query.temp_directory. Point this at a fast local volume (NVMe SSD) and ensure adequate free space (2-4× the largest join input is a safe starting point).

Vacuum

DuckDB does not require explicit VACUUM; its storage layout compacts on checkpoint. No Spice-level vacuum automation is provided.

Capacity & Sizing

Connection Pool

Parameter	Default	Description
`connection_pool_size`	`max(10, number of datasets on the same instance)`	Maximum connections in the shared DuckDB pool.
(pool min idle)	`10`	Minimum idle connections.

Datasets sharing a DuckDB instance share the pool. For write-heavy refresh plus read-heavy query workloads, size the pool to cover expected concurrency plus a small headroom; DuckDB's serializable concurrency model limits benefit beyond the point of write contention.

Memory

DuckDB self-tunes its memory limit based on system memory. For containers, set a memory_limit pragma via the connection string to prevent OOM due to cgroup misdetection. Plan for the DuckDB working set plus ~2× for query execution headroom.

Index Parameters

Parameter	Description
`duckdb_index_scan_percentage`	Optimizer hint: fraction of rows below which index scan is preferred over table scan.
`duckdb_index_scan_max_count`	Optimizer hint: maximum rows for which index scan is preferred.
`on_refresh_sort_columns`	Columns to sort by during refresh. Caution: current implementation uses `CREATE OR REPLACE`, which drops constraints and indexes.

DuckDB supports traditional B-tree / ART indexes via SQL CREATE INDEX against the accelerated table. Define them once the dataset schema is stable.

Metrics

Generic acceleration metrics are available with the dataset_acceleration_ prefix. DuckDB-specific OpenTelemetry instruments are not currently registered at the runtime layer. For DuckDB-internal telemetry, query DuckDB directly via Spice:

SELECT * FROM duckdb_memory();
PRAGMA database_size;

See Component Metrics for enabling and exporting runtime metrics.

Task History

DuckDB acceleration operations participate in task history through the shared acceleration spans (accelerated_table_refresh, sql_query) plus DuckDB's SQL execution wrapped in DataFusion plan nodes.

Known Limitations

on_refresh_sort_columns drops indexes: The current implementation issues CREATE OR REPLACE TABLE ... ORDER BY ..., which drops pre-existing indexes and constraints. Re-run CREATE INDEX statements after sort-column refreshes or pin DDL changes via startup scripts.
Single writer: A DuckDB file has one writer at a time. Two Spice instances must not share the same file in write mode.
Version pinning: DuckDB database files are tied to the DuckDB binary version. Upgrading Spice to a version with a newer embedded DuckDB may require re-materialization.
No built-in remote replication: Cross-host replication is not provided; use file-level replication or a cloud block-store snapshot.

Troubleshooting

Symptom	Likely cause	Resolution
Slow first startup after restart	WAL replay due to ungraceful shutdown.	Use graceful shutdown (`SIGTERM`). Subsequent starts will be fast once the checkpoint is clean.
OOM on refresh	DuckDB memory limit too high for container cgroup.	Set a `memory_limit` pragma via the connection string.
Disk fills during large queries	Spill directory on undersized volume.	Point `runtime.query.temp_directory` at a larger volume; monitor free space.
Query uses table scan when an index exists	`duckdb_index_scan_percentage` / `duckdb_index_scan_max_count` too low.	Tune thresholds; `EXPLAIN` to confirm.
Indexes disappear after refresh	`on_refresh_sort_columns` triggers `CREATE OR REPLACE`.	Re-create indexes post-refresh, or avoid sort-column refreshes until the underlying behavior is updated.
`IO Error: Could not set lock on file`	Another process holds a write lock.	Ensure single-writer semantics; verify no other Spice instance is using the same file.

Authentication & Secrets​

Resilience & Durability​

Memory vs File Mode​

Checkpointing​

Spill Directory​

Vacuum​

Capacity & Sizing​

Connection Pool​

Memory​

Index Parameters​

Metrics​

Task History​

Known Limitations​

Troubleshooting​