Performance Tuning
This guide provides recommendations for optimizing Spice performance across data acceleration, query execution, caching, and resource allocation.
Accelerator Selection
Choose the appropriate Data Accelerator based on dataset characteristics and query patterns.
| Scenario | Recommended Accelerator | Key Configuration |
|---|---|---|
| Small datasets (under 1 GB), low latency | arrow | Default in-memory |
| Medium datasets (1-100 GB), complex SQL | duckdb with mode: file | Set duckdb_memory_limit |
| Large datasets (100 GB - 1+ TB) | cayenne | Tune cache parameters |
| Write-heavy workloads | cayenne with zstd | Set cayenne_compression_strategy: zstd |
| Point lookups, large datasets | cayenne | Vortex provides 100x faster random access |
| Point lookups with explicit indexes | duckdb or sqlite | Configure indexes |
Spice Cayenne Performance Optimization
Spice Cayenne uses the Vortex columnar format for high-performance analytics on large datasets.
Point Lookups and Random Access
Vortex provides 100x faster random access compared to Apache Parquet through:
- Segment statistics: Per-segment min/max/null_count for predicate pushdown (zone-map equivalent)
- Fast random access encodings: FSST, FastLanes, and ALP support O(1) or near-O(1) random access
- Compute push-down: Filter execution on compressed data without full decompression
- Array statistics:
is_sorted,is_constant,min,maxfor query optimization
For point lookups on large datasets, Spice Cayenne often matches or exceeds the performance of traditional B-tree indexes while consuming no additional memory for index structures.
Cache Configuration
Spice Cayenne maintains two in-memory caches that significantly impact query performance:
datasets:
- from: s3://bucket/data/
name: analytics
acceleration:
engine: cayenne
mode: file
params:
cayenne_footer_cache_mb: 256 # Increase for many files
cayenne_segment_cache_mb: 512 # Increase for hot data patterns
Footer Cache Sizing:
The footer cache stores file metadata. Size based on file count:
- 1-10 KB per file
- Default 128 MB supports ~10,000-100,000 files
- Increase for datasets with more files
Segment Cache Sizing:
The segment cache stores decompressed data. Size based on working set:
- Estimate the volume of frequently accessed data
- Cache hits avoid decompression overhead
- Monitor cache hit rates via observability metrics
Compression Strategy
| Strategy | Read Performance | Write Performance | Compression Ratio |
|---|---|---|---|
btrblocks (default) | Fastest | Moderate | Higher |
zstd | Moderate | Faster | High |
Choose btrblocks for read-heavy analytics workloads. Use zstd only when size on disk is the primary concern—setting zstd trades query performance for reduced storage size.
DuckDB Performance Optimization
DuckDB provides mature SQL support with sophisticated query optimization.
Memory Configuration
datasets:
- from: postgres:schema.table
name: orders
acceleration:
engine: duckdb
mode: file
params:
duckdb_memory_limit: 4GB
duckdb_file: /data/orders.duckdb
Guidelines:
- Set
duckdb_memory_limitto control memory per instance - DuckDB defaults to 80% of system memory per instance
- Reserve 30% of container memory for the runtime
- Multiple datasets using the same
duckdb_fileshare a connection pool
Connection Pool Tuning
acceleration:
engine: duckdb
params:
connection_pool_size: 20 # Default: 10
Increase connection_pool_size for high-concurrency workloads. Each connection consumes memory.
Index Configuration
DuckDB supports ART (Adaptive Radix Tree) indexes for faster point lookups:
datasets:
- from: postgres:schema.orders
name: orders
acceleration:
engine: duckdb
mode: file
indexes:
order_id: enabled
'(customer_id, created_at)': enabled
Indexes consume memory and do not spill to disk. Creating an index requires the entire dataset to be loaded into memory. Monitor memory usage when adding indexes. For more details on ART index performance, see the ART paper.
Zone-Maps and Sorted Data
DuckDB automatically creates zone-maps (min/max statistics) for each row group, enabling efficient predicate pushdown. In practice, zone-maps on sorted data often outperform ART indexes for range and equality queries while consuming no additional memory.
Why Zone-Maps Outperform Indexes:
- Zero memory overhead (statistics stored with data)
- No index maintenance during writes
- Automatic predicate pushdown during scans
- Effective when data is sorted by query filter columns
Optimization Pattern: Sorted Views
Accelerate a view with ORDER BY to create sorted physical data, then set duckdb_preserve_insertion_order: true to maintain sort order:
datasets:
- from: iceberg:catalog/namespace/table
name: raw_data_by_arrival
time_column: processed_time
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_mode: append
primary_key: id
on_conflict:
id: upsert
params:
duckdb_memory_limit: 12GiB
duckdb_preserve_insertion_order: false # Raw data doesn't need order
views:
- name: data_sorted
sql: |
SELECT id, account_id, pool_id, value, created_at
FROM raw_data_by_arrival
WHERE __deleted = 'false'
ORDER BY account_id, pool_id
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_check_interval: 210s
params:
duckdb_file: data_sorted.duckdb
duckdb_memory_limit: 6GiB
duckdb_preserve_insertion_order: true # Maintains ORDER BY sort
Key Configuration:
| Parameter | Value | Purpose |
|---|---|---|
duckdb_preserve_insertion_order | true on sorted view | Maintains physical sort order from ORDER BY |
duckdb_preserve_insertion_order | false on source table | Faster writes without order guarantees |
Separate duckdb_file | Per view | Isolates sorted data from source tables |
Queries filtering on account_id or (account_id, pool_id) benefit from zone-map pruning, skipping entire row groups that don't match the filter predicates.
Aggregate Pushdown
Enable aggregate pushdown for improved performance on supported aggregate queries:
acceleration:
engine: duckdb
params:
optimizer_duckdb_aggregate_pushdown: enabled
Requires query_federation to be disabled. Supports count, sum, avg, min, and max functions.
DataFusion Query Engine
Spice uses Apache DataFusion as its query execution engine for Arrow and Spice Cayenne accelerators. DataFusion provides vectorized, multi-threaded query execution with automatic memory management and spilling.
Query Parallelism
DataFusion automatically parallelizes queries across available CPU cores. By default, the number of partitions equals the number of CPU cores, providing maximum parallelism.
DataFusion's FairSpillPool divides the configured memory_limit evenly among partitions. With more CPU cores, each partition receives less memory, which may increase spilling for memory-intensive queries.
Join Algorithm Selection
DataFusion supports multiple join algorithms and automatically selects the best one based on query statistics:
| Algorithm | Memory Usage | Best For |
|---|---|---|
| Hash Join | Higher | Fast execution with sufficient memory (default) |
| Sort-Merge Join | Lower | Memory-constrained environments, pre-sorted data |
| Nested Loop Join | Variable | Cross joins, non-equi joins |
DataFusion prefers hash joins by default for equi-joins. Hash joins do not currently support spilling, so memory-constrained environments may benefit from sort-merge joins for large datasets.
Dynamic Filter Pushdown
DataFusion pushes filters from operators (TopK, Join, Aggregate) into file scans to prune data early. This optimization is enabled by default and can skip entire row groups or files based on statistics. For example, a SELECT * FROM t ORDER BY timestamp DESC LIMIT 10 query pushes timestamp filters down to file scans, pruning files that cannot contain top-10 candidates.
Parquet Read Optimizations
Spice uses DataFusion's Parquet reader, which applies several optimizations automatically when reading Parquet files from S3, file, Iceberg, and Delta Lake connectors:
- Row group pruning: Skips entire row groups (typically 128 MB) based on min/max statistics in Parquet metadata
- Page Index filtering: Uses page-level min/max statistics (typically 8 KB chunks) for finer-grained pruning
- Bloom filter evaluation: Checks bloom filters for equality predicates when available in Parquet files
- Projection pushdown: Reads only the columns referenced in the query
These optimizations are applied automatically and require no configuration. The effectiveness depends on data layout—sorting data by frequently filtered columns maximizes row group pruning.
File Format Filtering and Optimization
Spice connects to various file formats (Parquet, Iceberg, Delta Lake) and uses DataFusion's query execution engine to push down predicates and prune data at the file, row group, and page level. Understanding these optimizations helps when designing data layouts for optimal query performance.
Parquet
Apache Parquet stores data in row groups with per-column statistics. DataFusion uses these statistics to skip row groups that cannot contain matching rows.
Row Group Pruning:
Each Parquet row group contains min/max statistics for each column. When a query includes a WHERE clause, DataFusion evaluates whether each row group could contain matching rows based on these statistics. Row groups that cannot match are skipped entirely.
For example, with a predicate WHERE timestamp > '2024-01-01', DataFusion skips row groups where the maximum timestamp is before 2024-01-01.
Page Index:
Parquet's Page Index provides finer-grained statistics at the page level within row groups. DataFusion uses the Page Index to skip individual pages, reducing I/O for selective queries.
Bloom Filters:
Parquet files can include bloom filters for membership testing. For equality predicates like WHERE user_id = 'abc123', DataFusion checks the bloom filter before reading column data. If the bloom filter indicates the value is not present, the row group is skipped.
Late Materialization (Filter Pushdown):
DataFusion supports applying filters during Parquet decoding rather than after. This optimization, called late materialization, filters rows before materializing all columns, reducing memory usage for selective queries.
Iceberg
Apache Iceberg provides hidden partitioning and multi-level metadata filtering that simplifies query optimization.
Hidden Partitioning:
Iceberg automatically derives partition values from source columns using transforms like day(timestamp), month(timestamp), or bucket(user_id, 16). Queries filter on the source column directly (e.g., WHERE timestamp > '2024-01-01'), and Iceberg automatically prunes partitions without requiring users to specify partition columns in predicates.
Two-Level Metadata Filtering:
Iceberg uses a hierarchical metadata structure that enables filtering at multiple levels:
- Manifest list filtering: The manifest list contains partition value ranges for each manifest file. Manifests that cannot contain matching rows are skipped entirely.
- Manifest filtering: Each manifest file contains per-file column statistics (min/max, null counts, value counts). Data files that cannot contain matching rows are skipped.
This two-level approach can eliminate entire groups of files before reading any data, providing significant performance benefits for large tables.
Column-Level Statistics:
Iceberg manifests store column-level statistics including:
lower_boundandupper_boundfor min/max filteringnull_countfor null handling optimizationvalue_countfor cardinality estimation
Partition Evolution:
Iceberg supports changing partition schemes without rewriting existing data. Historical data retains its original partitioning while new data uses the updated scheme. Queries automatically account for both partition layouts.
Delta Lake
Delta Lake provides data skipping and Z-ordering for query optimization.
Data Skipping:
Delta Lake collects column statistics (min, max, null counts) during writes. The delta.dataSkippingNumIndexedCols table property controls how many columns have statistics collected (counted from the first column in the schema). Queries filter using these statistics to skip files that cannot contain matching rows.
Generated Columns:
Delta Lake supports generated columns that derive values from other columns. When partitioned by a generated column (e.g., eventDate generated from CAST(eventTime AS DATE)), queries filtering on the source column automatically benefit from partition pruning.
Z-Ordering:
Z-ordering colocates related data in the same files by clustering on specified columns. After running OPTIMIZE ... ZORDER BY (column), queries filtering on the Z-ordered columns benefit from improved data skipping.
Compaction:
Delta Lake's OPTIMIZE command compacts small files into larger ones, reducing the number of files to scan and improving query performance through better I/O patterns.
Vortex (Spice Cayenne)
Spice Cayenne uses Vortex, which provides segment-level statistics and compute push-down on compressed data.
Segment Statistics:
Vortex's ChunkedLayout maintains per-segment statistics including min, max, null_count, is_sorted, and is_constant for each column. These statistics function similarly to DuckDB's zone-maps, enabling segment pruning during query execution.
Compute Push-Down:
Vortex supports executing filter operations directly on compressed data. For encodings like FSST (strings), FastLanes (integers), and ALP (floats), predicates can be evaluated without full decompression, reducing CPU and memory usage.
Encoding-Aware Optimization:
Vortex tracks encoding metadata that enables additional optimizations:
is_sorted: Enables binary search for point lookupsis_constant: Returns values immediately without scanning- Encoding-specific optimizations based on data characteristics
See Spice Cayenne Performance Optimization for cache tuning and other Cayenne-specific settings.
Performance Implications
| Optimization | Parquet | Iceberg | Delta Lake | Vortex |
|---|---|---|---|---|
| Row group/file pruning | ✅ | ✅ | ✅ | ✅ |
| Page-level filtering | ✅ | ✅ (via Parquet) | ✅ (via Parquet) | ✅ (segment-level) |
| Bloom filters | ✅ | ✅ (via Parquet) | ❌ | ❌ |
| Hidden partitioning | ❌ | ✅ | ❌ | ❌ |
| Manifest-level filtering | ❌ | ✅ | ❌ | ❌ |
| Compute on compressed data | ❌ | ❌ | ❌ | ✅ |
| Z-ordering | ❌ | ✅ | ✅ | ❌ |
Optimization Recommendations:
- Sort data by filter columns: Row group and segment statistics are most effective when data is sorted by commonly filtered columns
- Use appropriate file sizes: Larger row groups (128 MB+) provide better compression but reduce pruning granularity
- Collect statistics on filter columns: Ensure filter columns are within the statistics collection limit (e.g.,
delta.dataSkippingNumIndexedCols) - Consider Z-ordering for multi-column filters: When queries filter on multiple columns, Z-ordering colocates related data
Query Memory Management
Configure DataFusion query memory limits to prevent out-of-memory errors:
runtime:
query:
memory_limit: 8GiB
temp_directory: /tmp/spice
spill_compression: zstd
Memory Limit
Set memory_limit based on available container/machine memory minus accelerator requirements:
runtime memory_limit = Total Memory - Accelerator Memory - OS/Runtime Overhead (30%)
Spill Compression
| Compression | Disk Usage | CPU Overhead |
|---|---|---|
zstd (default) | Lowest | Moderate |
lz4_frame | Medium | Lowest |
uncompressed | Highest | None |
Choose lz4_frame when CPU is limited and disk is abundant. Choose uncompressed for debugging or when spill files are rare. DataFusion uses Arrow IPC Stream format for spill files.
Batch Processing
DataFusion processes data in batches of 8192 rows by default. This batch size balances memory usage with vectorized execution efficiency. Larger batches improve CPU cache utilization and SIMD operations but consume more memory per partition.
Temporary Directory
Place spill files on fast storage (SSD/NVMe) separate from data files:
runtime:
query:
temp_directory: /fast-ssd/spice-temp
The max_temp_directory_size setting limits the total size of temporary files (default: 100 GB).
Caching Configuration
Spice supports multiple caching layers for query acceleration.
SQL Results Cache
Cache query results for repeated queries:
runtime:
caching:
sql_results:
enabled: true
max_size: 512MiB
item_ttl: 5m
eviction_policy: tiny_lfu # Higher hit rate than lru
cache_key_type: plan # Matches semantically equivalent queries
Cache Key Types:
| Type | Behavior | Use Case |
|---|---|---|
plan (default) | Uses query logical plan | Varied query formatting |
sql | Uses exact SQL string | Identical repeated queries |
Use sql for lowest latency with identical queries. Use plan for semantic query matching.
Stale-While-Revalidate
Serve stale cached results while refreshing in the background:
runtime:
caching:
sql_results:
enabled: true
item_ttl: 1m
stale_while_revalidate_ttl: 5m # Serve stale for 5m while refreshing
This pattern reduces query latency spikes during cache refresh.
Data Refresh Optimization
Refresh Mode Selection
| Mode | Memory Impact | Use Case |
|---|---|---|
full | 2.5x dataset | Small-medium datasets, complete updates |
append | Minimal | Time-series, logs, immutable data |
changes | Minimal | CDC-enabled sources |
caching | Minimal | Dynamic content, API data |
Append Mode Optimization
Use time_column for efficient incremental updates:
datasets:
- from: s3://bucket/events/
name: events
time_column: event_time
acceleration:
engine: cayenne
mode: file
refresh_mode: append
refresh_check_interval: 5m
Partitioned Data
For data that has a granular time column and a separate column for partitioning (i.e. day buckets), set time_partition_column to the partitioning column based on time:
datasets:
- from: s3://bucket/events/
name: events
time_column: event_time
time_partition_column: event_date # Physical partition column
acceleration:
refresh_mode: append
In this scenario, event_date is the day bucket used for physical partitioning (e.g., s3://bucket/events/event_date=2025-10-01/), while event_time provides the granular timestamp for precise filtering.
| event_id | event_time (time_column) | event_date (time_partition_column) | event_type | user_id |
|---|---|---|---|---|
| 8f2a-1 | 2025-10-01 08:14:22.123 | 2025-10-01 | page_view | u_442 |
| 8f2a-2 | 2025-10-01 22:01:05.884 | 2025-10-01 | click | u_901 |
| 9c11-a | 2025-10-02 01:12:44.001 | 2025-10-02 | purchase | u_442 |
| 9c11-b | 2025-10-02 14:30:12.550 | 2025-10-02 | page_view | u_118 |
Last-Modified Optimization
For object storage with append-only files, use last_modified to skip unchanged files:
datasets:
- from: s3://bucket/logs/
name: logs
time_column: last_modified # Special value using file metadata
acceleration:
refresh_mode: append
Resource Allocation
Kubernetes
Configure resource requests and limits based on workload:
apiVersion: v1
kind: Pod
metadata:
name: spice
spec:
containers:
- name: spice
image: spiceai/spiceai:latest
resources:
requests:
memory: '8Gi'
cpu: '4'
limits:
memory: '12Gi'
# Do not set CPU limits - can cause throttling
volumeMounts:
- name: data
mountPath: /data
- name: temp
mountPath: /tmp/spice
volumes:
- name: data
persistentVolumeClaim:
claimName: spice-data
- name: temp
emptyDir:
medium: Memory # Use RAM for temp files if available
Avoid setting CPU limits. CPU limits can cause throttling even when CPU is available, degrading query performance. Set CPU requests to guarantee scheduling.
Storage Recommendations
| Storage Type | Use Case |
|---|---|
| SSD/NVMe | Acceleration data files, query spill files |
| HDD | Cold data archival, infrequent access |
| RAM (tmpfs) | Temporary files for high-performance workloads |
Monitoring and Profiling
Task History
Enable query plan capture for slow query analysis:
runtime:
task_history:
enabled: true
captured_plan: explain analyze
min_sql_duration: 1s # Only capture plans for queries >1s
Metrics
Monitor key performance metrics:
query_duration_ms- Query execution timequery_executions- Query throughputdataset_load_state- Acceleration status- Cache hit rates
See Observability for metric configuration.
Performance Checklist
Use this checklist when optimizing Spice deployments:
- Select appropriate accelerator based on dataset size and query patterns
- Configure memory limits for DuckDB and/or Spice Cayenne caches
- Set
runtime.query.memory_limitwith spill directory on fast storage - Enable caching for repeated queries
- Use
refresh_mode: appendfor time-series data - Configure indexes for point lookup queries (DuckDB/SQLite)
- Set resource limits in Kubernetes
- Enable observability for monitoring
Related Documentation
Spice Documentation:
- Managing Memory Usage - Memory configuration reference
- Data Accelerators - Accelerator documentation
- Spice Cayenne Data Accelerator - Spice Cayenne-specific tuning
- DuckDB Data Accelerator - DuckDB-specific tuning
- Caching - Cache configuration
- Observability - Metrics and monitoring
External References:
- Apache DataFusion - Query execution engine for Arrow and Spice Cayenne
- DataFusion Configuration - DataFusion configuration settings
- DataFusion Tuning Guide - Performance tuning for DataFusion
- DuckDB Indexing - Zone-maps and ART index documentation
- Vortex - Columnar format used by Spice Cayenne
- Vortex Benchmarks - Performance benchmarks for Vortex
- Apache Parquet File Format - Row groups, statistics, and Page Index
- Iceberg Partitioning - Hidden partitioning and partition evolution
- Iceberg Performance - Metadata filtering and column statistics
- Delta Lake Optimizations - Data skipping, Z-ordering, and compaction
