DuckDB Data Connector Deployment Guide
Production operating guide for the DuckDB data connector (used to federate queries against an existing DuckDB database file).
Authentication & Secrets​
DuckDB is an embedded engine; the connector reads a local DuckDB database file. No network authentication is involved.
| Parameter | Description |
|---|---|
open | Absolute path to the DuckDB database file. |
duckdb_connection_string | Alternative: DuckDB connection URI with options. |
Protect the DuckDB file with filesystem permissions. Store it on encrypted storage (LUKS/dm-crypt, EBS encryption, etc.) for data-at-rest protection. For data loaded from cloud object stores inside DuckDB, configure AWS/Azure/GCS credentials via DuckDB extensions rather than Spice parameters.
Resilience Controls​
File Concurrency​
DuckDB supports a single writer with many readers per database file. If the file is shared with another process that holds a write lock, the connector returns an I/O error on open. Co-locate the writer and the Spice reader on the same host, or use DuckDB's read-only mode (access_mode: read_only) when federating a file produced by an upstream ETL job.
Crash Recovery​
DuckDB's WAL provides crash recovery for any process that wrote to the file. The Spice connector does not itself write (the data connector is read-only; the DuckDB accelerator is distinct and handles write paths).
Capacity & Sizing​
- Memory: DuckDB's default memory limit is self-managed based on system memory. For constrained environments, set a
memory_limitpragma via the connection string. - Disk: Plan for 1.5–2× the raw data size to accommodate DuckDB's internal compression, WAL, and temporary spill files during query execution.
- Temporary spill: Large queries spill to DuckDB's temp directory; ensure adequate disk and set
temp_directoryto a fast local volume if the default (same as the database file) is on slow storage.
Metrics​
The DuckDB connector does not register connector-specific instruments. Monitor via Spice's query metrics (query_duration_ms, query_processed_rows). See Component Metrics for general configuration.
For DuckDB-internal metrics, use DuckDB's duckdb_memory() and pragma database_size via a SQL query against the connector.
Task History​
DuckDB queries participate in task history through DataFusion's execution-plan spans.
Known Limitations​
- Read-only via the data connector: For a writable, Spice-managed DuckDB, use the DuckDB accelerator instead.
- Single-writer: A DuckDB file cannot be written by two processes concurrently. Coordinate writers out-of-band.
- Version compatibility: DuckDB files are tied to the DuckDB binary version. Upgrading DuckDB in Spice may require regenerating older database files.
Troubleshooting​
| Symptom | Likely cause | Resolution |
|---|---|---|
IO Error: Could not set lock on file | Another process holds the DuckDB write lock. | Ensure only one writer; open in read-only mode if Spice should not hold a write lock. |
Catalog Error: Table ... does not exist | Table name mismatch or database not at the expected path. | Query SELECT * FROM information_schema.tables via the connector to list tables. |
| Queries spill aggressively, slow performance | Working set exceeds memory. | Increase system memory or set a smaller batch size; direct temp to faster storage. |
Serialization Error: Failed to deserialize ... database ... not a valid database | DuckDB version mismatch. | Upgrade/downgrade Spice's DuckDB version to match the file producer. |
