5 posts tagged with "distributed query"

Spice v1.10.0 (Dec 9, 2025)

December 9, 2025 · 18 min read

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.10.0! ⚡

Spice v1.10.0 introduces a new Caching Acceleration Mode with stale-while-revalidate (SWR) semantics for disk-persisted, low-latency queries with background refresh. This release also adds the TinyLFU eviction policy for the SQL results cache, a preview of the DynamoDB Streams connector for real-time CDC, S3 location predicate pruning for faster partitioned queries, improved distributed query execution, and multiple security hardening improvements.

What's New in v1.10.0

Caching Acceleration Mode

Low-Latency Queries with Background Refresh: This release introduces a new caching acceleration mode that implements the stale-while-revalidate (SWR) pattern. Queries return cached results immediately while data refreshes asynchronously in the background, eliminating query latency spikes during refresh cycles. Cached data persists to disk using DuckDB, SQLite, or Cayenne file modes.

Key Features:

Stale-While-Revalidate (SWR): Returns cached data immediately while refreshing in the background, reducing query latency
Disk Persistence: Cached results persist across restarts using DuckDB, SQLite, or Cayenne file modes
Configurable Refresh: Control refresh intervals with refresh_check_interval to balance freshness and source load

Recommendation: Use retention configuration with caching acceleration to ensure stale data is cleaned up over time.

Example spicepod.yaml configuration:

datasets:
  - from: http://localhost:7400
    name: cached_data
    time_column: fetched_at
    acceleration:
      enabled: true
      engine: duckdb
      mode: file # Persist cache to disk
      refresh_mode: caching
      refresh_check_interval: 10m
      retention_check_enabled: true
      retention_period: 24h
      retention_check_interval: 1h

For more details, refer to the Data Acceleration Documentation.

TinyLFU Cache Eviction Policy

Higher Cache Hit Rates for SQL Results Cache: A new TinyLFU cache eviction policy is now available for the SQL results cache. TinyLFU is a probabilistic cache admission policy that maintains higher hit rates than LRU while keeping memory usage predictable, making it ideal for workloads with varying query frequency patterns.

Example spicepod.yaml configuration:

runtime:
  caching:
    sql_results:
      enabled: true
      eviction_policy: tiny_lfu # default: lru

For more details, refer to the Caching Documentation and the Moka TinyLFU Documentation for details of the algorithm.

DynamoDB Streams Data Connector (Preview)

Real-Time Change Data Capture for DynamoDB: The DynamoDB connector now integrates with DynamoDB Streams for real-time change data capture (CDC). This enables continuous synchronization of DynamoDB table changes into Spice for real-time query, search, and LLM-inference.

Key Features:

Real-Time CDC: Automatically captures inserts, updates, and deletes from DynamoDB tables as they occur
Table Bootstrapping: Performs an initial full table scan before streaming changes, ensuring complete data consistency
Acceleration Integration: Works with refresh_mode: changes to incrementally update accelerated datasets

Note: DynamoDB Streams must be enabled on your DynamoDB table. This feature is in preview.

Example spicepod.yaml configuration:

datasets:
  - from: dynamodb:my_table
    name: orders_stream
    acceleration:
      enabled: true
      refresh_mode: changes # Enable Streams capture

For more details, refer to the DynamoDB Connector Documentation.

OpenTelemetry Metrics Exporter

Spice can now push metrics to an OpenTelemetry collector, enabling integration with platforms such as Jaeger, New Relic, Honeycomb, and other OpenTelemetry-compatible backends.

Key Features:

Protocol Support: Supports the gRPC (default port 4317) protocol
Configurable Push Interval: Control how frequently metrics are pushed to the collector

Example spicepod.yaml configuration for gRPC:

runtime:
  telemetry:
    enabled: true
    otel_exporter:
      endpoint: 'localhost:4317'
      push_interval: '30s'

For more details, refer to the Observability & Monitoring Documentation.

S3 Connector Improvements

S3 Location Predicate Pruning: The S3 data connector now supports location-based predicate pruning, dramatically reducing data scanned by pushing down location filter predicates to S3 listing operations. For partitioned datasets (e.g., year=2025/month=12/), Spice now skips listing irrelevant partitions entirely, significantly reducing query latency and S3 API costs.

AWS S3 Tables Write Support: Full read/write capability for AWS S3 Tables, enabling direct integration with AWS's managed table format for S3. Use standard SQL INSERT INTO to write data.

For more details, refer to the S3 Data Connector Documentation and Glue Data Connector Documentation.

Faster Distributed Query Execution

Distributed query planning and execution have been significantly improved:

Fixed executor registration in cluster mode for more reliable distributed deployments
Improved hostname resolution for Flight server binding, enabling better executor discovery
Distributed accelerator registration: Data accelerators now properly register in distributed mode
Optimized query planning: DistributeFileScanOptimizer improvements for faster planning with large datasets

For more details, refer to the Distributed Query Documentation.

Search Improvements

Search capabilities have been improved with several performance and reliability enhancements:

Fixed FTS query blocking: Full-text search queries no longer block unnecessarily, improving query responsiveness
Optimized vector index operations: Eliminated unnecessary list_vectors calls for better performance
Improved limit pushdown: IndexerExec now properly handles limit pushdown for more efficient searches

For more details, refer to the Search Documentation.

Security Hardening

Multiple security improvements have been implemented:

SQL Identifier Quoting: Hardened SQL identifier quoting across all database connectors (PostgreSQL, MySQL, DuckDB, etc.) to prevent SQL injection attacks through table or column names
Token Redaction: Sensitive authentication tokens are now fully redacted in debug and error output, preventing accidental credential exposure in logs
Path Traversal Prevention: Fixed tar extraction operations to prevent directory traversal vulnerabilities when processing archived files
Input Sanitization: Added strict validation for top_n_sample order_by clause parsing to prevent injection attacks
Glue Credential Handling: Prevented automatic loading of AWS credentials from environment in Glue connector, ensuring explicit credential configuration

Developer Experience Improvements

Health probe metrics: Added health probe latency metrics for better observability
CLI improvements: Fixed .clear history command in the REPL to fully clear persisted history

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No major cookbook updates.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.10.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.10.0 image:

docker pull spiceai/spiceai:1.10.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Changelog

Test-operator: Add tpcds_q8 to the default row-count validation skip list by @sgrebnov in #8185
fix: Remove unwrap_used from test by @peasee in #8212
Run glue_iceberg_integration_test_catalog as part of main integration tests by @sgrebnov in #8222
Add TPCH sf100 testoperator spicepods with dispatch by @Jeadie in #8192
Build with CPU native flags by @lukekim in #8224
fix: Apply assertion clippy in CI/Makefile only by @peasee in #8229
feat: Support running queries only in testoperator by @peasee in #8211
DuckDB query planning: aggregate pushdown by @mach-kernel in #8174
install.sh improvements by @lukekim in #8252
Fix .clear history by @lukekim in #8254
Improve the output of dataset loading by @lukekim in #8256
Refactor view validation by @lukekim in #8258
Upgrade AWS crates by @lukekim in #8259
fix: Pushdown dynamic filters to partition scans by @peasee in #8240
Harden SQL identifier quoting in connectors by @phillipleblanc in #8276
Cayenne sort_columns on insert by @lukekim in #8091
Redact token debug output by @phillipleblanc in #8280
fix: Cayenne configuration options by @lukekim in #8281
Prevent path traversal in untar by @phillipleblanc in #8284
Fix cluster mode executor registration by @mach-kernel in #8292
Unignore s3_vectors_kafka_stream test by @Jeadie in #8289
Post-release house keeping by @krinart in #8293
Improve generate_changelog script by @krinart in #8273
Acceleration mode caching by @lukekim in #8237
Sanitization and security checks by @lukekim in #7854
Add health probe latency metric by @phillipleblanc in #8300
Add distributed registration for data accelerators by @phillipleblanc in #8299
Pass IndexedTableProvider down in 'changes_stream' and 'append_stream' by @Jeadie in #8295
Add dynamodb-streams crate by @krinart in #8283
Distributed query: resolve executor hostname when determining Flight server binding by @mach-kernel in #8304
Return computed embeddings from index for partitioned S3Vectors by @Jeadie in #8306
[DDB Streams] Skeleton for DynamoDB Streams by @krinart in #8296
DistributeFileScanOptimizer: Improve planning performance by @mach-kernel in #8305
feat: Add an ExactLeftAccumulator implementation by @peasee in #8302
deps: Upgrade Vortex to 0.56 by @peasee in #8311
DynamoDB table bootstrapping + streaming by @krinart in #8312
Avoid calling S3Vector list_vectors (or equivalent) when indexing into VectorIndexs by @Jeadie in #8282
Add on_conflict testing support to append benchmark by @sgrebnov in #8314
docker: Add valid home directory to fix duckdb extension loading issue by @phillipleblanc in #8318
Add GH Workflow to run Append benchmark test by @sgrebnov in #8321
Exclude MySQL SF100 from test-operator dispatch by @sgrebnov in #8320
feat: Update clippy lints by @peasee in #8317
Add S3 location predicate pruning to listing connector by @phillipleblanc in #8319
Review feedback for caching mode accelerator by @phillipleblanc in #8326
Also include Dockerfile home changes for release build by @phillipleblanc in #8327
Change communication channel from Discord to Slack by @Jeadie in #8330
Replace Discord link with Slack link in README by @Jeadie in #8331
fix(glue): Prevent OpenDAL from automatic loading of AWS credentials from environment by @sgrebnov in #8337
Block on index read for FTS queries by @Jeadie in #8339
Fix search query provider by @Jeadie in #8343
Support for writing into AWS S3 Tables by @sgrebnov in #8344
Acceleration file_create mode by @lukekim in #8347
Don't block on lock in FTS query path by @Jeadie in #8348
feat: Add an optimizer rule to replace join accumulator for Cayenne by @peasee in #8316
S3 Vectors limit updates by @lukekim in #8352
Sanitize top_n_sample order_by parsing by @phillipleblanc in #8356
Add distributed registration for data connectors by @phillipleblanc in #8354
Improve IndexerExec to properly handle limit pushdown by @sgrebnov in #8366
Fix Cayenne partition_by metadata flaky integration test by @phillipleblanc in #8367
Rework caching accelerator to use the stale-while-revalidate pattern. by @phillipleblanc in #8365
Add TinyLFU caching policy by @lukekim in #8370
Make Arrow acceleration on_conflict verification more robust by @sgrebnov in #8375
Add additional test for verify_on_conflict_matches_primary_key (Arrow acceleration) by @sgrebnov in #8376
Add v1.10.0-rc1 release notes by @mach-kernel in #8373
docs: Remove DuckDB agg pushdown from release notes by @peasee in #8383
Testoperator dispatch: add Append support and test configurations by @sgrebnov in #8360
fix: Increase TPCDS DuckDB connection pool size by @peasee in #8386
fix: Update benchmark snapshots by @app/github-actions in #8385
fix: Update benchmark snapshots by @app/github-actions in #8389
Fix Windows build by @phillipleblanc in #8391
DuckDB aggregate pushdown: fix partitioning and schema rewrite bugs by @mach-kernel in #8397
Delta table: Store current snapshot ref with table instance by @mach-kernel in #8358
GetAppDefinition: Check if executor is part of cluster by @mach-kernel in #8396
1.10.0-rc1 housekeeping by @mach-kernel in #8394
Change debug log to warning for vector engine config by @Jeadie in #8378
Clarify /v1/nsql datasets sampling hint by @phillipleblanc in #8395
Use bmi1 target feature for x86_64 by @phillipleblanc in #8401
benchmarks: Default to update snapshots when run on a non-release branch by @phillipleblanc in #8402
Update threat model for v1.9.2 by @phillipleblanc in #8400
Fix iceberg tables metadata - assign ids to all fields, including nested by @ewgenius in #8351
Fix databricks_spark_connect_m2m_integration_test_catalog snapshot by @ewgenius in #8403
Move all GitHub Actions workflows to use exact commit sha by @phillipleblanc in #8409
Upgrade datafusion-tableproviders (df v50) by @lukekim in #8261
Batching for CDC by @krinart in #8359
fix: make DuckDB attachments logic more robust by @sgrebnov in #8411
Persistent checkpoints for DynamoDB Streams by @krinart in #8345
Distributed query: Support AsyncFuncExec and Spice UDFs in Ballista by @mach-kernel in #8414
Pin GitHub Actions to fix Testoperator build action by @sgrebnov in #8416
Watermarks support for DynamoDB by @krinart in #8417
Fix typo in .vscode/launch.json by @sgrebnov in #8415
DuckSqlExec: Update equivalence properties when rewriting schema by @phillipleblanc in #8420
Validate that the commit for datafusion-table-providers exists on the spiceai branch by @phillipleblanc in #8421
Append Tests: add support for retention testing by @sgrebnov in #8419
New crate google-genai by @Jeadie in #8390
federation: Improve error message and add debug logging for cast failures by @phillipleblanc in #8422
DynamoDB Streams Error Handling by @krinart in #8418
Append tests: add support for with_retention_data to dispatch by @sgrebnov in #8430
Append tests: add support for test metrics reporting by @sgrebnov in #8432
test-operator: fix metrics reporting by @sgrebnov in #8435
Follow-up improvements and bug fixes by @krinart in #8433
Periodic snapshots for append/changes streams by @krinart in #8407
Add support for caching_stale_if_error to caching accelerator; fix multiple upstream requests during SWR; fix Arrow accelerator by @phillipleblanc in #8425
Disable dataset health monitor for dynamic HTTP connector by @phillipleblanc in #8441
Metrics + snapshots_trigger_threshold for DynamoDB Streams by @krinart in #8437
Clear the in-flight revalidations cache after a revalidation has completed. by @phillipleblanc in #8443
dont build 'spicepod-validator' on 'make install' by @Jeadie in #8426
fix: Update benchmark snapshots by @app/github-actions in #8436
fix: Disable Cayenne HashJoin rewriter optimizer by @peasee in #8439
Testoperator: add duckdb-partitioned query override by @sgrebnov in #8446
Add a check to validate that results cache SWR and caching accelerator SWR are not both set. by @phillipleblanc in #8445
OTel exporter for push metrics by @lukekim in #8442
fix: Update benchmark snapshots by @app/github-actions in #8448
Add snapshot creation logging by @krinart in #8469
Fix PeriodicReader panic by @krinart in #8471
fix: Pin CUDA build actions to commits by @peasee in #8477
DuckDB agg pushdown: gate behind accelerator parameter by @mach-kernel in #8474
Rename aggregate_pushdown_optimization -> optimizer_duckdb_aggregate_pushdown by @ewgenius in #8485
Handle throttling exception for DynamoDB streams by @phillipleblanc in #8492

Spice v1.10.0-rc.1 (Dec 2, 2025)

December 3, 2025 · 11 min read

David Stancu

Principal Software Engineer at Spice AI

Announcing the release of Spice v1.10.0-rc.1! ⚡

v1.10.0-rc1 is a release candidate for early testing of v1.10 features including an all new caching acceleration mode, tiny_lfu caching policy, a new DynamoDB Streams connector (Preview), improvements to the DynamoDB connector, faster distributed query execution, S3 connector improvements, and security hardening for v1.10.0-stable.

What's New in v1.10.0-rc1

Caching Acceleration Mode with SWR and TinyLFU

This release introduces a new caching acceleration mode that implements the stale-while-revalidate (SWR) pattern using Data Accelerators such as DuckDB or Cayenne, enabling queries to return file-persisted cached results immediately while asynchronously refreshing data in the background. Combined with the new TinyLFU cache eviction policy, Spice can now maintain higher cache hit rates while keeping memory usage predictable.

Key Features:

Stale-While-Revalidate (SWR): Returns cached data immediately while refreshing in the background
Data Accelerator Support: Cached accelerators can persist data to disk using DuckDB, SQLite, or Cayenne file modes.
TinyLFU Cache Policy: Probabilistic cache admission policy that maintains high hit rates with minimal overhead
Predictable Memory Usage: Configurable memory limits with automatic eviction of less frequently used entries

Example Spicepod.yml configuration:

runtime:
  caching:
    sql_results:
      enabled: true
      eviction_policy: tiny_lfu # default lru

datasets:
  - from: s3://my-bucket/data.parquet
    name: cached_data
    acceleration:
      enabled: true
      engine: duckdb
      mode: file # Persist cache to disk
      refresh_mode: caching
      refresh_check_interval: 10m

For more details, refer to the Data Acceleration Documentation and Caching Documentation.

DynamoDB Streams Data Connector in Preview

DynamoDB Connector now integrates with DynamoDB Streams which enables real-time streaming with support for both table bootstrapping and continuous change data capture (CDC). This connector automatically detects changes in DynamoDB tables and streams them into Spice for real-time query, search, and LLM-inference.

Key Features:

Real-Time CDC: Automatically captures inserts, updates, and deletes from DynamoDB tables
Table Bootstrapping: Initial full table load before streaming changes

Example Spicepod.yml configuration:

datasets:
  - from: dynamodb:my_table
    name: orders_stream
    acceleration:
      enabled: true
      refresh_mode: changes

For more details, refer to the DynamoDB Connector Documentation.

Cayenne Accelerator Enhancements

The Cayenne data accelerator now supports:

Sort Columns Configuration: Optimize inserts by pre-sorting data on specified columns for improved query performance

Example Spicepod.yml configuration:

datasets:
  - from: s3://my-bucket/data.parquet
    name: sorted_data
    acceleration:
      enabled: true
      engine: cayenne
      mode: file_create
      params:
        sort_columns: timestamp,region

For more details, refer to the Cayenne Documentation.

S3 Connector Improvements

S3 Location Predicate Pruning: The S3 data connector now supports location-based predicate pruning, dramatically reducing data scanned by pushing down predicates to S3 listing operations. This optimization is especially effective for partitioned datasets stored in S3.

AWS S3 Tables Write Support: Full read/write capability for AWS S3 Tables, enabling fast integration with AWS's table format for S3.

For more details, refer to the S3 Tables Data Connector Documentation and Glue Data Connection Documentation.

Faster Distributed Query Execution

Distributed query planning and execution have been significantly improved:

Fixed executor registration in cluster mode for more reliable distributed deployments
Improved hostname resolution for Flight server binding, enabling better executor discovery
Distributed accelerator registration: Data accelerators now properly register in distributed mode
Optimized query planning: DistributeFileScanOptimizer improvements for faster planning with large datasets

For more details, refer to the Distributed Query Documentation.

Search Improvements

Search capabilities have been improved with several performance and reliability enhancements:

Fixed FTS query blocking: Full-text search queries no longer block unnecessarily, improving query responsiveness
Optimized vector index operations: Eliminated unnecessary list_vectors calls for better performance
Improved limit pushdown: IndexerExec now properly handles limit pushdown for more efficient searches

For more details, refer to the Search Documentation.

Security Hardening

Multiple security improvements have been implemented:

SQL identifier quoting: Hardened SQL identifier quoting across all connectors to prevent injection attacks
Token redaction: Sensitive tokens are now fully redacted in debug output to prevent credential leakage
Path traversal prevention: Fixed tar extraction to prevent path traversal vulnerabilities
Input sanitization: Added validation for top_n_sample order_by parsing
Improved credential handling: Improved credential management in Glue connector

Developer Experience Improvements

Health probe metrics: Added health probe latency metrics for better observability
CLI improvements: Fixed .clear history command in the REPL to fully clear persisted history

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No major cookbook updates. The Spice Cookbook still offers 82+ recipes to help you prototype quickly.

Upgrading

To try v1.10.0-rc1, use one of the following methods:

CLI:

spice upgrade --version 1.10.0-rc1

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.10.0-rc1 image:

docker pull spiceai/spiceai:1.10.0-rc1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai --version 1.10.0-rc1

AWS Marketplace:

🎉 Spice is available in the AWS Marketplace.

What's Changed

Changelog

Test-operator: Add tpcds_q8 to the default row-count validation skip list by @sgrebnov in #8185
fix: Remove unwrap_used from test by @peasee in #8212
Run glue_iceberg_integration_test_catalog as part of main integration tests by @sgrebnov in #8222
Add TPCH sf100 testoperator spicepods with dispatch by @Jeadie in #8192
Build with CPU native flags by @lukekim in #8224
Make copilot check for empty copyright header by @krinart in #8245
fix: Apply assertion clippy in CI/Makefile only by @peasee in #8229
feat: Support running queries only in testoperator by @peasee in #8211
DuckDB query planning: aggregate pushdown by @mach-kernel in #8174
install.sh improvements by @lukekim in #8252
Fix .clear history by @lukekim in #8254
fix: Pushdown dynamic filters to partition scans by @peasee in #8240
Harden SQL identifier quoting in connectors by @phillipleblanc in #8276
Cayenne sort_columns on insert by @lukekim in #8091
Redact token debug output by @phillipleblanc in #8280
fix: Cayenne configuration options by @lukekim in #8281
Prevent path traversal in untar by @phillipleblanc in #8284
Fix cluster mode executor registration by @mach-kernel in #8292
Unignore s3_vectors_kafka_stream test by @Jeadie in #8289
Post-release house keeping by @krinart in #8293
Improve generate_changelog script by @krinart in #8273
Acceleration mode caching by @lukekim in #8237
Sanitization and security checks by @lukekim in #7854
Add health probe latency metric by @phillipleblanc in #8300
Add distributed registration for data accelerators by @phillipleblanc in #8299
Pass IndexedTableProvider down in 'changes_stream' and 'append_stream' by @Jeadie in #8295
Add dynamodb-streams crate by @krinart in #8283
Distributed query: resolve executor hostname when determining Flight server binding by @mach-kernel in #8304
Return computed embeddings from index for partitioned S3Vectors by @Jeadie in #8306
[DDB Streams] Skeleton for DynamoDB Streams by @krinart in #8296
DistributeFileScanOptimizer: Improve planning performance by @mach-kernel in #8305
feat: Add an ExactLeftAccumulator implementation by @peasee in #8302
deps: Upgrade Vortex to 0.56 by @peasee in #8311
DynamoDB table bootstrapping + streaming by @krinart in #8312
Avoid calling S3Vector list_vectors (or equivalent) when indexing into VectorIndexs by @Jeadie in #8282
Add on_conflict testing support to append benchmark by @sgrebnov in #8314
docker: Add valid home directory to fix duckdb extension loading issue by @phillipleblanc in #8318
Add GH Workflow to run Append benchmark test by @sgrebnov in #8321
Exclude MySQL SF100 from test-operator dispatch by @sgrebnov in #8320
feat: Update clippy lints by @peasee in #8317
Add S3 location predicate pruning to listing connector by @phillipleblanc
Review feedback for caching mode accelerator by @phillipleblanc in #8326
Also include Dockerfile home changes for release build by @phillipleblanc in #8327
Change communication channel from Discord to Slack by @Jeadie in #8330
Replace Discord link with Slack link in README by @Jeadie in #8331
fix(glue): Prevent OpenDAL from automatic loading of AWS credentials from environment by @sgrebnov in #8337
Block on index read for FTS queries by @Jeadie in #8339
Fix search query provider by @Jeadie in #8343
Support for writing into AWS S3 Tables by @sgrebnov in #8344
Acceleration file_create mode by @lukekim in #8347
Don't block on lock in FTS query path by @Jeadie in #8348
feat: Add an optimizer rule to replace join accumulator for Cayenne by @peasee in #8316
S3 Vectors limit updates by @lukekim in #8352
Sanitize top_n_sample order_by parsing by @phillipleblanc in #8356
Update version to v1.10.0-rc.1 by @ewgenius in #8362
Improve IndexerExec to properly handle limit pushdown by @sgrebnov in #8366
Fix Cayenne partition_by metadata flaky integration test by @phillipleblanc in #8367
Rework caching accelerator to use the stale-while-revalidate pattern. by @phillipleblanc in #8365
Add TinyLFU caching policy by @lukekim in #8370

Spice v1.9.0 (Nov 19, 2025)

November 19, 2025 · 59 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.9.0-stable! 🌶

v1.9.0-stable introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations, and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0 also upgrades to DataFusion v50, DuckDB v1.4.2, and Delta-Kernel v0.16 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, and delivers many additional features and improvements.

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Introducing Cayenne: SQL as an Acceleration Format: A new high-performance Data Accelerator that simplifies multi-file data acceleration by using an embedded database (SQLite) for metadata while storing data in the Vortex columnar format, a Linux Foundation project. Cayenne delivers query and ingestion performance better than DuckDB's file-based acceleration without DuckDB's memory overhead and the scaling challenges of single DuckDB files.

Cayenne uses SQLite to manage acceleration metadata (schemas, snapshots, statistics, file tracking) through simple SQL transactions, while storing data in Vortex's compressed columnar format. This architecture provides:

Key Features:

SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
  - from: s3:my_table
    name: accelerated_data_30d
    acceleration:
      enabled: true
      engine: cayenne
      mode: file
      refresh_mode: append
      retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Apache Ballista Integration: Spice now supports distributed query execution based on Apache Ballista, enabling distributed queries across multiple executor nodes for improved performance on large datasets. This feature is in preview in v1.9.0.

Architecture:

A distributed Spice cluster consists of:

Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
The feature is in preview and may have stability or performance limitations
Specific acceleration support is planned for future releases

For more details, refer to the Distributed Query Documentation.

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

Apache Spark Compatible Functions: Added support for Spark-compatible functions including array, bit_get/bit_count, bitmap_count, crc32/sha1, date_add/date_sub, if, last_day, like/ilike, luhn_check, mod/pmod, next_day, parse_url, rint, and width_bucket.

Bug Fixes & Reliability: Resolved issues with partition name validation and empty execution plans when vector index lists are empty. Fixed timestamp support for partition expressions, enabling better partitioning for time-series data.

See the Apache DataFusion 50.0.3 Release for more details.

DuckDB v1.4.2 Upgrade and Accelerator Improvements

DuckDB v1.4.2: DuckDB has been upgraded to v1.4.2, which includes several performance optimizations.

Composite ART Index Support: DuckDB in Spice now supports composite (multi-column) Adaptive Radix Tree (ART) indexes for accelerated table scans. When queries filter on multiple columns fully covered by a composite index, the optimizer automatically uses index scans instead of full table scans, delivering significant performance improvements for selective queries.

Example configuration:

datasets:
  - from: file://data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      indexes:
        '(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

DuckDB Intermediate Materialization: Queries with indexes now use intermediate materialization (WITH ... AS MATERIALIZED) to leverage faster index scans. Currently supported for non-federated queries (query_federation: disabled) against a single table with indexes only. When predicates cover more columns than the index, the optimizer rewrites queries to first materialize index-filtered results, then apply remaining predicates. This optimization can deliver significant performance improvements for selective queries.

Example configuration:

datasets:
  - from: file://sales_data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      params:
        query_federation: disabled # Required currently for intermediate materialization
      indexes:
        '(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
  SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/large_table/
    name: partitioned_data
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      retention:
        sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

For more details, refer to the DuckDB Data Accelerator Documentation.

HTTP Data Connector

Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.
Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table
Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      max_retries: 3
      client_timeout: 10s
      allowed_request_paths: /search/people
      request_query_filters: enabled
      request_body_filters: enabled

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      allowed_request_paths: /search/people
      request_query_filters: enabled
      request_body_filters: enabled
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_sql: |
        SELECT request_path, request_query, content 
        FROM tvmaze
        WHERE request_path = '/search/people'
          AND request_query IN ('q=michael', 'q=luke')

For more details, refer to the HTTP Data Connector Documentation.

DynamoDB Data Connector Improvements

Improved Query Performance: The DynamoDB Data Connector now includes improved filter handling for edge cases, parallel scan support for faster data ingestion, and better error handling for misconfigured queries. These improvements enable more reliable and performant access to DynamoDB data.

Example Spicepod.yml configuration:

datasets:
  - from: dynamodb:my_table
    name: ddb_data
    params:
      scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

For more details, refer to the DynamoDB Data Connector Documentation.

S3 Data Connector Improvements

S3 Versioning Support: Spice now supports S3 Versioning for all connectors using object-store (S3, Delta Lake, etc.), ensuring range reads over versioned files are atomically correct. When S3 versioning is enabled, Spice automatically tracks version IDs during file discovery and uses them for all subsequent range reads, preventing inconsistencies from concurrent file modifications.

Current limitations:

Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
Version tracking is automatic when S3 versioning is enabled on the bucket

S3 Single-File Refresh Skipping: Spice now optimizes S3 single-file dataset refreshes by caching file metadata (ETag, Version ID, size, timestamp) and skipping unnecessary data fetches when the underlying file hasn't changed. This optimization dramatically reduces bandwidth usage and improves refresh performance for scenarios where data doesn't change frequently. The feature is enabled by default for accelerated S3 single-file datasets and includes metrics tracking for skipped refreshes.

Example configuration:

datasets:
  - from: s3://my-bucket/data.parquet
    name: s3_data
    acceleration:
      enabled: true
      engine: duckdb
      refresh_check_interval: 10s

When the file's metadata hasn't changed between refresh checks, Spice will skip the data fetch entirely, logging:

Skipping refresh for dataset 's3_data': file metadata unchanged

For more details, refer to the S3 Data Connector Documentation.

Search & Embeddings Enhancements

Full-Text Search on Views: Full-text search indexes are now supported on views, enabling advanced search scenarios over pre-aggregated or transformed data. This extends the power of Spice's search capabilities beyond base datasets.

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
  - name: aggregated_reviews
    sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
    embeddings:
      - column: review_text
        model: openai:text-embedding-3-small

For more details, refer to the Search Documentation and Embeddings Documentation.

Dedicated Query Thread Pool (Now Enabled by Default)

Dedicated Query Thread Pool: Query execution and accelerated refreshes now run on their own dedicated thread pool, separate from the HTTP server. This prevents heavy query workloads from slowing down API responses, keeping health checks fast and avoiding unnecessary Kubernetes pod restarts under load.

This feature was opt-in in previous releases and is now enabled by default. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
  params:
    dedicated_thread_pool: none

For more details, refer to the Runtime Configuration Documentation.

Query Performance Optimizations

Stale-While-Revalidate Cache Control: Query results now support "stale-while-revalidate" cache control, allowing stale cached data to be served immediately while asynchronously refreshing the cache entry in the background. This improves response times for frequently-accessed queries while maintaining data freshness. Requires cache key type to be set to "sql (raw)" for proper operation.

Optimized Prepared Statements: Prepared statement handling has been optimized for better performance with parameterized queries, reducing planning overhead and improving execution time for repeated queries.

Large RecordBatch Chunking: Large Arrow RecordBatch objects are now automatically chunked to control memory usage during query execution, preventing memory exhaustion for queries returning large result sets.

Query Result Caching: Compressed Encoding, Stale-While-Revalidate Cache Control

Zstd Compression Encoding: Query result caching now supports optional Zstandard (zstd) compression encoding to reduce memory usage for cached query results. This is particularly beneficial for large result sets, reducing cache memory footprint while maintaining fast decompression times. Encoding can be configured via the encoding parameter with options none (default) or zstd.

Example configuration:

runtime:
  caching:
    sql_results:
      enabled: true
      max_size: 128MiB
      item_ttl: 1m
      encoding: zstd # Enable zstd compression

HTTP Cache-Control Support: The query result cache now supports the stale-while-revalidate Cache-Control directive, enabling faster response times by serving stale cached results immediately while asynchronously refreshing the cache in the background. This feature is particularly useful for applications that can tolerate slightly stale data in exchange for improved performance.

Example configuration:

runtime:
  caching:
    sql_results:
      enabled: true
      max_size: 128MiB
      item_ttl: 1m
      stale_while_revalidate_ttl: 1m # serve stale items for up to 1 minute after `item_ttl` expires

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

Immediately return the stale cached result to the client
Asynchronously re-execute the query in the background to refresh the cache
Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
Background revalidation re-executes queries through the normal query path
Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

For more details, refer to the Results Caching Documentation.

Security & Reliability Improvements

Enhanced HTTP Client Security: HTTP client usage across the runtime has been hardened with improved TLS validation, certificate pinning for critical endpoints, and better error handling for network failures.

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Improved Credential Retry Logic: AWS SDK credential initialization has been significantly improved with more robust retry logic and better error handling. The system now automatically retries transient credential resolution failures using Fibonacci backoff, allowing Spice to tolerate extended AWS outages (up to ~48 hours) without manual intervention.

Key features:

Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Version-Controlled Data Access: The new Git Data Connector (Alpha) enables querying datasets stored in Git repositories. This connector is ideal for use cases involving configuration files, documentation, or any data tracked in version control.

Example Spicepod.yml configuration:

datasets:
  - from: git:https://github.com/myorg/myrepo
    name: git_metrics
    params:
      file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK has been upgraded with support for configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
    .withArrowMemoryLimitMB(1024) // 1GB limit
    .build();

For more details, refer to the Java SDK Documentation.

CLI Improvements

Install Specific Versions: The spice install command now supports installing specific versions of the Spice runtime and CLI. This enables easy version management, downgrading, or installation of specific releases for testing or compatibility requirements.

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

Persistent Query History: The Spice CLI REPL (SQL, search, and chat interfaces) now persists command history to ~/.spice/query_history.txt, making your query history available across sessions. The history file is automatically created if it doesn't exist, with graceful fallback if the home directory cannot be determined.

New REPL Commands:

.clear - Clear the screen using ANSI escape codes for a clean workspace
.clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear          # Clears the screen
sql> .clear history  # Clears command history
sql> # Use arrow keys or tab to access previous commands

For more details, refer to the CLI Documentation.

Additional Improvements & Bug Fixes

Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
Reliability: Fixed vector dimension determination for partitioned indexes.
Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
Search: Fixed search field handling as metadata for chunked search indexes.
Validation: Added timestamp support for partition expressions.
Validation: Fixed regexp_match function for DuckDB datasets.
Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0 image:

docker pull spiceai/spiceai:1.9.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

DataFusion: Upgraded to v50
Apache Arrow: Upgraded to v56
DuckDB: Upgraded to v1.4.2
Delta Kernel: Upgraded to v0.16.0

Changelog

Fix for search field as metadata for chunked search indexes by @Jeadie in #7429
Bump object_store from 0.12.3 to 0.12.4 by @app/dependabot in #7433
Properly respect disabling snapshots by @phillipleblanc in #7431
Revert "Properly respect disabling snapshots" by @sgrebnov in #7439
Revert "Disable snapshots by default" by @sgrebnov in #7438
Add preview warning for write access mode by @sgrebnov in #7440
fix: regexp_match for DuckDB datasets by @kczimm in #7443
Add feature is currently in preview warning for snapshots by @sgrebnov in #7442
[Logger] Also emit Datafusion logs by @mach-kernel in #7441
add missing snapshot by @kczimm in #7446
Fix tracing so that ai_completions are parented under sql_query by @lukekim in #7415
Enable snapshot acceleration by default by @phillipleblanc in #7451
Disable acceleration refresh metrics by @krinart in #7450
Add v1.8 release notes by @phillipleblanc in #7430
fix: partition name validation by @kczimm in #7452
Fix lint error due to ignore without reasons by @krinart in #7454
Add models and CUDA support to spiced install script by @lukekim in #7457
Post-release 1.8 updates by @phillipleblanc in #7455
Remove println in datafusion by @phillipleblanc in #7461
Update end_game.md to notify once release is done by @sgrebnov in #7460
Remove italics from snapshot logging by @phillipleblanc in #7463
Update openapi.json by @app/github-actions in #7466
Fix generate spicepod schema by @phillipleblanc in #7464
Fix generate acknowledements by @phillipleblanc in #7465
Update spicepod.schema.json by @app/github-actions in #7469
fix: Ensure ListingTable partitions are pruned when filters are not used by @peasee in #7471
Create runtime-secrets crate by @phillipleblanc in #7474
Create runtime-parameters crate by @phillipleblanc in #7475
Don't download the snapshot if the acceleration is present by @phillipleblanc in #7477
Bump hyper-util from 0.1.16 to 0.1.17 by @app/dependabot in #7434
Add missing remote CLI feature by @lukekim in #7478
Add 1.8.0 release analytics by @sgrebnov in #7481
CLI multi-line input support for spice sql by @lukekim in #7479
fix: duckdb partitioning cannot reload config by @kczimm in #7482
fix: Make search cache test use a slower uncached search by @peasee in #7473
Add support for S3 dataset params by @phillipleblanc in #7476
Add DuckDB TPC-H memory limit variations by @lukekim in #7484
Add better snapshot validation for incorrectly configured spicepods by @phillipleblanc in #7487
Move blocking/sync I/O to spawn blocking by @lukekim in #7462
Add DuckDB file accelerator 2G and 4G dispatches by @lukekim in #7491
Validate spicepod file exists before running tests by @lukekim in #7492
Make snapshot reading/writing more robust with Iceberg-like metadata.json by @phillipleblanc in #7486
Re-use build-testoperator and ensure it's cached by @lukekim in #7494
Fix duckdb test operator limit casing by @lukekim in #7498
fix: Update benchmark snapshots by @app/github-actions in #7499
Create runtime-request-context crate by @Jeadie in #7459
Add integration tests for Acceleration DB snapshotting by @phillipleblanc in #7489
Two minor fixes for AI udf tests by @krinart in #7503
Add model response timeout for ai udf tests by @krinart in #7504
Show error if FTS is misconfigured for datasets/views by @krinart in #7458
Add test for chunked search index with search field as metadata by @Jeadie in #7513
Add sccache for build test operator by @lukekim in #7515
Enhancement: Add spill_compression to runtime config by @krinart in #7505
Improve GitHub Data Connector by @lukekim in #7510
Add RequestTimeoutException to S3 client by @Jeadie in #7514
Add sha=<> to snapshot logging by @phillipleblanc in #7521
Add Type to GitHub Data Connector issues and fix double aliasing for project author by @lukekim in #7519
DuckDB acceleration: fix memory leak in duckdb_arrow_scan by @sgrebnov in #7524
Fix partition_by accelerations when a projection is applied on empty partition sets by @phillipleblanc in #7526
Nullable fields for index columns by @Jeadie in #7523
Fix missing winver dependency for Windows by @krinart in #7538
Update mongo config for benchmarks by @krinart in #7546
Add acceleration snapshots cookbook to template by @phillipleblanc in #7527
Bump github/codeql-action from 3 to 4 by @app/dependabot in #7535
Bump golang.org/x/sys from 0.36.0 to 0.37.0 by @app/dependabot in #7529
Make spice chat play nice with Unix pipes by @Jeadie in #7525
Configurable DuckDB duckdb_index_scan_percentage & duckdb_index_scan_max_count by @lukekim in #7551
[cherry-pick] Release notes for release 1.8.1 by @krinart in #7556
Fix 1.8.1 release notes by @krinart in #7558
FTS index with nonfilterable metadata; search field not metadata by default. by @Jeadie in #7548
Properly set auth headers in github_release.py by @krinart in #7560
project_schema when using EmptyExec by @kczimm in #7543
Add 1.8.1 release analytics by @kczimm in #7561
Bump golang.org/x/mod from 0.28.0 to 0.29.0 by @app/dependabot in #7530
Hive-style partitioning for DuckDB file mode by @kczimm in #7563
Vortex Data Accelerator (Dev grade) by @lukekim in #7566
Only load eval scorers when eval defined by @Jeadie in #7549
Bump octocrab from 0.45.0 to 0.47.0 by @app/dependabot in #7531
Bump regex from 1.11.3 to 1.12.1 by @app/dependabot in #7532
Fix custom file path for Vortex Data Accelerator by @phillipleblanc in #7570
Add List type support to Vortex Data Accelerator by @lukekim in #7569
Bump parking_lot from 0.12.4 to 0.12.5 by @app/dependabot in #7534
Bump tokio-postgres from 0.7.14 to 0.7.15 by @app/dependabot in #7533
Remove duplicate line from 1.8.1 release notes by @krinart in #7580
Upgrade Go from v1.24.2 to v1.25.3 by @lukekim in #7582
Fix race condition in S3 Vectors index and bucket creation by @kczimm in #7577
Add runtime-async crate with managed Tokio runtime by @phillipleblanc in #7575
Optimize GitHub Actions workflows by @lukekim in #7584
Use 'location' as primary key for document tables by @Jeadie in #7567
Extend query-related metrics by @krinart in #7571
Enabling acceleration refresh metrics by using runtime.metrics config by @krinart in #7583
S3Vector service metrics in client by @Jeadie in #7502
Fix score order for one test case by @Jeadie in #7595
ObjectMeta filter pushdown for ObjectStoreTextTable by @Jeadie in #7572
Return TableProvider from CandidateGeneration::search. by @Jeadie in #7559
EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider by @Jeadie in #7587
Update official Docker builds to use release binaries by @phillipleblanc in #7597
New Generate Changelog workflow by @krinart in #7562
BytesProcessedExec to allow optimizer to do limit pushdown by @mach-kernel in #7539
GitHub Data Connector add Projects, improve rate-limiting and error handling by @lukekim in #7547
Add copilot-instructions to help improve Copilot reviews by @lukekim in #7606
Add support for DuckDB table-based partitioning by @sgrebnov in #7581
fix: Use nextest for integration models tests by @peasee in #7617
Fix license issue in table-providers by @phillipleblanc in #7620
Remove Build Docker Image from PR checks by @phillipleblanc in #7621
Combine PR Lint + Build checks by @phillipleblanc in #7623
Remove Cache Rust builds step by @phillipleblanc in #7625
Rename duckdb_partition_mode to partition_mode param by @sgrebnov in #7622
DuckDB table partitioning: delete partitions that no longer exist after full refresh by @sgrebnov in #7614
Build integration test binaries in single job by @phillipleblanc in #7624
Make DuckDB table partition data write threshold configurable by @sgrebnov in #7626
Handle table relations in HTTP v1/search by @Jeadie in #7615
Fix E2E Test sporadic failures on the macOS runners by @phillipleblanc in #7627
Emit query_active_count metric by @krinart in #7589
fix: Disable go cache in actions by @peasee in #7631
fix: Don't nullify DuckDB release callbacks for schemas by @peasee in #7628
Fix integration tests by reverting the use of batch inserts w/ prepared statements by @phillipleblanc in #7630
Split integration tests into 3 partitions by @phillipleblanc in #7635
Initial Pepper data accelerator by @lukekim in #7592
Only build the E2E Test CI binaries once by @phillipleblanc in #7633
Update BytesProcessedExec snapshots by @mach-kernel in #7637
Properly set RequestContext for stream execution in Flight by @krinart in #7591
Add task for creating release branch in docs by @kczimm in #7642
Add missing mongodb params by @krinart in #7647
fix: Update benchmark snapshots by @app/github-actions in #7649
Release notes for v1.8.2 by @Jeadie in #7645
docs: Update error handling in copilot instructions by @peasee in #7652
Pepper accelerator INSERT OVERWRITE support by @lukekim in #7643
fix: Update benchmark snapshots by @app/github-actions in #7650
Run Datafusion queries on a separate Tokio runtime by @phillipleblanc in #7586
Add explicit steps for docs DRI in end game by @kczimm in #7658
Add Release 1.8.2 QA Analytics by @krinart in #7661
Pepper full / append refresh support by @lukekim in #7662
Add 'client_timeout' for s3 vector by @Jeadie in #7501
Improvements to Endgame template by @krinart in #7660
Fix OSS docker release trigger when release marked as latest by @phillipleblanc in #7668
Use '#[serde(deny_unknown_fields)]' for base spicepod components by @Jeadie in #7669
DataFusion upgrade template to include Ballista by @mach-kernel in #7679
S3 Vector index spilling by @kczimm in #7613
Refresh request context bindings / fix trunk integration tests by @mach-kernel in #7680
Fix integration tests for refresh append by @lukekim in #7681
Distributed query support by @mach-kernel in #7585
Run acceleration refreshes on separate Tokio runtime by @phillipleblanc in #7671
Support DESCRIBE in clustered mode by @kczimm in #7686
use hyphen instead of period for index name spill separator by @kczimm in #7697
Add streaming option to /nsql endpoint by @kczimm in #7695
Task History min_sql_duration filter support by @lukekim in #7698
Spawn object_store IO tasks on the original Tokio runtime by @phillipleblanc in #7689
Update openapi.json by @app/github-actions in #7700
Adjust DataFusion runtime worker threads by @phillipleblanc in #7704
Gate dedicated SQL engine CPU runtime behind opt-in param dedicated_thread_pool by @phillipleblanc in #7705
Add TPCH S3 refresh spicepod by @phillipleblanc in #7706
DuckDB: include ANALYZE after write to update query optimizer statistics by @sgrebnov in #7714
Display execution time in Spice REPL for no results by @sgrebnov in #7713
Task History capture and store SQL query plans by @lukekim in #7701
Pepper TPC-H SF-1 benchmark by @lukekim in #7717
fix: Update benchmark snapshots by @app/github-actions in #7720
Optimize prepared statements (parameterized queries) by @lukekim in #7703
Pepper accelerator tests (Clickbench, TPC-H SF-5, SF-100) by @lukekim in #7721
Add support for DuckDB connection_pool_size param by @sgrebnov in #7716
Add health probing to testoperator runs by @phillipleblanc in #7709
Simplify AcceleratedTable by @Jeadie in #7724
Add some basic indexing tests for 'FullTextDatabaseIndex' by @Jeadie in #7688
DuckDB: on_refresh_recompute_statistics param + ANALYZE for table-based partitioning by @sgrebnov in #7719
Fix Windows builds by excluding pepper/vortex by @phillipleblanc in #7729
Enable separate CPU runtime thread pool for DataFusion by default by @phillipleblanc in #7732
'runtime-datafusion' crate for runtime related DataFusion components by @Jeadie in #7666
Pepper expanded append refresh support by @lukekim in #7670
Pepper basic partitioning by @lukekim in #7731
Stable pepper benchmark snapshots by @phillipleblanc in #7739
Delta Lake Connector: Support AWS_SESSION_TOKEN parameter by @mach-kernel in #7752
Pepper use SQLite WAL by @lukekim in #7757
v1.8.3 release notes by @mach-kernel in #7745
Increase testoperator health check threshold to 50ms by @phillipleblanc in #7767
Data Accelerator Graceful Shutdown by @lukekim in #7756
Remove Windows CUDA builds by @phillipleblanc in #7768
fix: Update benchmark snapshots by @app/github-actions in #7771
Bump actions/download-artifact from 5 to 6 by @app/dependabot in #7746
Bump serde from 1.0.226 to 1.0.228 by @app/dependabot in #7743
Fix casing for keywords and additional columns by @Jeadie in #7770
Bump actions/upload-artifact from 4 to 5 by @app/dependabot in #7750
Bump criterion from 0.5.1 to 0.7.0 by @app/dependabot in #7740
Bump rustls-native-certs from 0.8.1 to 0.8.2 by @app/dependabot in #7744
Git Data Connector (Alpha) by @lukekim in #7772
Pepper accelerator delete support by @lukekim in #7616
Update Helm chart instructions for Helm in end_game.md by @sgrebnov in #7776
Turso data accelerator by @lukekim in #7472
Apply retention SQL filter to refresh fetch by @phillipleblanc in #7778
Add Parquet buffering option for DuckDB partitioned writes (tables mode) by @sgrebnov in #7735
fix: EmptyExec when list indexes is empty by @kczimm in #7784
1.8.3 post-release housekeeping by @mach-kernel in #7783
feat: Upgrade to Datafusion v50 by @peasee in #7777
fix: Replace vortex datafusion with public crate by @peasee in #7791
Full-text search on views by @Jeadie in #7733
Revert "Apply retention SQL filter to refresh fetch (#7778)" by @phillipleblanc in #7796
fix: Add ingest duration and acceleration size metrics to testoperator by @peasee in #7792
Set local timezone to UTC for DuckDB by @phillipleblanc in #7797
add Timestamp support for partition expressions by @kczimm in #7803
Fix trunk lint by @krinart in #7804
Add missing mongodb params by @krinart in #7807
Embedding columns on view components by @Jeadie in #7795
Add Turso as a Pepper Catalog metastore by @lukekim in #7793
Run retention_sql on refresh commit for DuckDB by @lukekim in #7785
docs: Update datafusion upgrade checklist by @peasee in #7812
Vector engines on views by @Jeadie in #7808
Handle refresh worker panics and add recovery test by @phillipleblanc in #7815
chunk large record batches to control memory usage by @kczimm in #7802
fix: cannot determine vector dimension for partitioned indexes by @kczimm in #7818
Upgrade to Turso v0.3 by @lukekim in #7821
fix: Ensure custom *Exec ExecutionPlans push down dynamic filters by @peasee in #7811
handle casing in RRF by @Jeadie in #7825
Enable 'turso' for pepper acceleration by default by @sgrebnov in #7826
Improved DynamoDB Data Connector by @krinart in #7715
Initial support for llama.cpp as LLM inference backend by @lukekim in #7794
Pepper: Implement retention SQL on refresh commit by @phillipleblanc in #7814
Fix Dockerfiles for arm64 by @lukekim in #7834
[DynamoDB] Handle filter edge-cases by @krinart in #7830
[DynamoDB] Support parallelization for Scan request by @krinart in #7829
Don't feature gate Pepper by @lukekim in #7832
Fix llama.cpp static link by @lukekim in #7835
fix: docker nightly builds by @kczimm in #7837
Use GitHub-hosted macOS runner only for tag releases by @lukekim in #7836
Fix Bug: DuckDB INTERNAL Error: Failed to load metadata pointer by @sgrebnov in #7839
Fix docker arm64 build to use aegis in pure-rust mode by @lukekim in #7840
Revert "Use GitHub-hosted macOS runner only for tag releases" by @lukekim in #7843
Rename Pepper to Cayenne by @lukekim in #7844
Tighten CLI permissions and install script by @lukekim in #7845
Set mvcc for Cayenne Turso metastore by @lukekim in #7850
Optimize Prepared Statements by @lukekim in #7859
Remove unwrap from ODBC connector, fix secrets, and kuberenetes secre… by @lukekim in #7846
Improve and secure HTTP client usage by @lukekim in #7847
Pin Oracle Instant Client download to a SHA by @lukekim in #7851
Improve experience for missing or invalid Spicepod.yaml by @lukekim in #7849
chore: Fix PR linting by @peasee in #7865
Revert FlightIPC issues by @Jeadie in #7870
Bump Jimver/cuda-toolkit from 0.2.28 to 0.2.29 by @app/dependabot in #7878
Optimize macOS and Windows builds by @lukekim in #7863
Improve error message by adding 'cayenne' to the list of valid accelerator engines by @sgrebnov in #7882
fix: Kafka message delivery failed by @kczimm in #7883
fix: allow parameter index without dollar signs by @kczimm in #7887
docs: Update component criteria by @peasee in #7891
Temporary disable supports_limit_pushdown for SchemaCastScanExec by @sgrebnov in #7893
fix: Make integration run with no relevant changes, disable makefile targets on PR by @peasee in #7896
Add Cayenne benchmark and concurrency tests and remove indexes for Turso MVCC by @lukekim in #7879
Remove '.embeddings[].metadata' by @Jeadie in #7897
Revert llama.cpp engine by @lukekim in #7898
Make Cayenne snapshotting more robust by @sgrebnov in #7899
Release notes v1.9.0-rc1 by @Jeadie in #7902
Fix dataset_acceleration_last_refresh_time_ms unit to milliseconds in description by @ewgenius in #7901
Fix lint error in record_explain_plan functionality by @sgrebnov in #7906
Cleanup old snapshots after full refresh by @lukekim in #7908
Cayenne deletion vector caching support by @lukekim in #7903
Split filters into partition filters (for pruning) and data filters by @lukekim in #7889
fix: Update benchmark snapshots by @app/github-actions in #7911
fix: Update benchmark snapshots by @app/github-actions in #7912
fix: Update benchmark snapshots by @app/github-actions in #7913
Update spicepod.schema.json by @app/github-actions in #7916
fix: Update benchmark snapshots by @app/github-actions in #7917
Add Cayenne & Turso accelerators to E2E CI test matrix by @lukekim in #7922
Make preview warnings consistent by @lukekim in #7921
Filter and write optimizations by @lukekim in #7918
fix: Set sccache region explicitly by @peasee in #7928
fix: Enable integration test merge group checks by @peasee in #7927
Update testoperator release branch from 1.8 to 1.9 by @peasee in #7926
Update DuckDB to 1.4.1 with composite ART scans by @mach-kernel in #7884
Don't build Windows on trunk pushes by @lukekim in #7931
fix: Use correct minio secret in build binary push by @peasee in #7934
Update test-framework workers to use dedicated Flight client by @sgrebnov in #7938
Fix financebench, configure s3vectors for appropriate snapshotting by @Jeadie in #7935
Don't try to initialize accelerator if it is disabled by @lukekim in #7932
Add spark UDFs to Spice by @Jeadie in #7936
Fix extra async_trait in cayenne metadata catalog by @phillipleblanc in #7942
deps: Upgrade to Rust 1.90 by @peasee in #7941
Add cayenne accelerator to README.md by @ewgenius in #7905
fix: Update benchmark snapshots by @app/github-actions in #7948
Run integration tests with AWS_EC2_METADATA_DISABLED flag by @sgrebnov in #7952
Only retry credentials on ConnectorError by @kczimm in #7944
fix: Improve join reordering by ensuring JoinSelection is applied by @peasee in #7828
fix: Remove unused deps, consolidate workspace deps by @peasee in #7953
bump async-openai commit by @kczimm in #7929
deps: Use vortex fork by @peasee in #7954
Enable tracing in delta lake integration tests by @sgrebnov in #7951
Update datasets in S3 vectors test case by @Jeadie in #7956
Add spiced metrics scraping to test operator by @lukekim in #7937
Memoize S3 vectors ListIndex API call with configurable TTL by @kczimm in #7910
Cayenne performance optimizations by @lukekim in #7907
Setup HotFix issue template by @ewgenius in #7957
Fix AWS SDK credential cache retry handling by @phillipleblanc in #7943
Infer RRF join_key from TableProvider::constraints and implement SearchQueryProvider::constraints. by @Jeadie in #7959
[Optimizer]: DuckDB intermediate materialization (non-federated) by @mach-kernel in #7964
1.7.3 post-release housekeeping by @ewgenius in #7962
Fix digest_many UDF for ColumnarValue::Array. by @Jeadie in #7960
Fix spiced metrics reporting as part of benchmark tests by @sgrebnov in #7967
Avoid pushing down Spice specific UDFs to accelerators during federation by @Jeadie in #7909
CLI file persisted history with .clear and .clear history commands by @lukekim in #7970
ResultsCache Cache-Control stale-while-revalidate by @lukekim in #7963
Use GetVectors API instead of returnData by @kczimm in #7083
Make DuckDB intermediate materialization logic more robust by @sgrebnov in #7971
[Cayenne] Configurable target Vortex file size by @lukekim in #7972
fix: Update benchmark snapshots by @app/github-actions in #7974
Bump github.com/klauspost/compress from 1.17.11 to 1.18.1 by @app/dependabot in #7872
fix: Update benchmark snapshots by @app/github-actions in #7978
fix: Update benchmark snapshots by @app/github-actions in #7982
Run Integration tests on spiceai-dev-runners by @sgrebnov in #7985
[CLI] Fix cursor issue due to flush by @lukekim in #7981
fix: Support S3 versioning, Vortex dynamic filter pushdown by @peasee in #7984
Make cluster a default feature by @lukekim in #7994
Optimize DuckDB Intermediate Index Materialization for No-Index Case by @sgrebnov in #7998
HTTP connector with dynamic filter support by @lukekim in #7969
Revert federation 'can_execute_plan' by @Jeadie in #7999
Fix stale caching by @lukekim in #7995
Fix count(*) for http connector by @krinart in #8001
[CLI] Install specific version by @lukekim in #8006
Fix stale with revalidate request/response by @lukekim in #8005
Fallback RequestContext for cluster queries by @Jeadie in #8007
Use use_rustls_tls for Spice Cloud /connect by @lukekim in #8008
Use delta-kernel-rs 0.16x + Parquet reader with object meta API changes by @mach-kernel in #8011
fix: Update datafusion & arrow-rs with S3 versioning fix by @lukekim in #8012
Add 1.9.0-rc.2 release notes by @sgrebnov in #7993
Update Datafusion version by @sgrebnov in #8014
[Acceleration] DuckDB tables mode partitioner + CTE rewrite optimizer by @mach-kernel in #8013
Update spicepod.schema.json by @app/github-actions in #8015
Update acknowledgements by @app/github-actions in #8016
Upgrade shutdown signal Ordering by @krinart in #8017
Set max-age: 0 during stale by @lukekim in #8018
Add E2E test release for Helm by @lukekim in #8023
Bump github.com/olekukonko/tablewriter from 0.0.5 to 1.1.1 by @app/dependabot in #7989
Bump schemars from 0.9.0 to 1.0.4 by @app/dependabot in #7877
Update generate_changelog script by @krinart in #8028
Update QA analytics Release 1.9.0-rc.2 by @krinart in #8027
[CLI] Improve auto-complete by @lukekim in #8022
Improve verify helm workflow by @lukekim in #8024
Bump azure_core from 0.28.0 to 0.30.0 by @app/dependabot in #7986
Test operator load test row count validation by @lukekim in #8036
fix: Revert HTTP response offloading by @peasee in #8041
Disable advanced filters pruning for partitioned tables by @sgrebnov in #8037
fix: Ensure Vortex UncompressedSizeInBytes is calculated by @peasee in #8044
Add 1.9.0-rc.3 release notes by @sgrebnov in #8048
fix: Update test snapshots by @app/github-actions in #8046
add benchmark spicepods by @Jeadie in #8047
DynamoDB TPC-H SF1 Benchmarks by @krinart in #8043
Bump github.com/AzureAD/microsoft-authentication-library-for-go from 1.5.0 to 1.6.0 by @app/dependabot in #7988
Bump golang.org/x/sys from 0.37.0 to 0.38.0 by @app/dependabot in #7987
v1.9.0-rc.2 README updates by @lukekim in #8035
Bump suppaftp from 5.4.0 to 6.3.0 by @app/dependabot in #7875
Bump ctor from 0.5.0 to 0.6.0 by @app/dependabot in #7873
WW README Update by @wyattwenzel in #8058
Reenable dynamic federation support by @Jeadie in #8026
fix: Prevent SortExec from ordering below SchemaCastScanExec by @peasee in #8061
Skip logging and return OK() on error during shutdown by @krinart in #8057
Partition pruning with complex expressions by @lukekim in #8040
Update openapi.json by @app/github-actions in #8064
Make DynamoDB snapshots consistent by @krinart in #8069
Add check for error log by @krinart in #8070
Fix tracing of 's3_vector_query_and_get' by @Jeadie in #8065
DuckDB v1.4.2 by @mach-kernel in #8073
Fix failing OpenAI test by @krinart in #8076
Enable 'test_recency_scoring' by @Jeadie in #8068
Test operator: avoid duplicate Flight requests when using --http-clients by @sgrebnov in #8071
Update load tests to use truth percentile values by @sgrebnov in #8079
Update DynamoDB to RC by @krinart in #8060
CachedQueryVector to avoid recomputing embedding vector for spilling/partitioned vector indexes. by @Jeadie in #8059
Fix DuckDB on_commit sink race by @lukekim in #8081
Add partitioned duckdb by @lukekim in #8083
[CLI] Security and santization by @lukekim in #8082
fix: Update benchmark snapshots by @app/github-actions in #8084
Fix partition_by expression by @lukekim in #8087
Data Components security fixes and sanitization by @lukekim in #8086
Runtime security and sanitization by @lukekim in #8088
Add spicepod-validator tool and fix spicepods by @lukekim in #8089
Skip data fetches for S3 single file refreshes by @lukekim in #8072
MCP security and sanitization by @lukekim in #8090
Update spicepod.schema.json by @app/github-actions in #8099
Update acknowledgements by @app/github-actions in #8098
Add install-dev target back to Makefile by @Jeadie in #8100
fix 'testoperator run search' by @Jeadie in #8101
Update datafusion-table-providers - fix nullability inferences for MySQL and PostgreSQL, and fix full text search for PostgreSQL by @ewgenius in #8092
Remove duplicate install-with-models by @phillipleblanc in #8107
Improve Cayenne partitioning by @lukekim in #8097
Testoperator dispatch: respect verify_results dispatch configuration by @sgrebnov in #8106
Include 'match' column only if chunk offsets found in seach query 'LogicalPlan' by @Jeadie in #8102
Fix validation path by @lukekim in #8109
Fix dispatch paths by @lukekim in #8110
Fix dispatch spicepod paths by @lukekim in #8112
fix: Update benchmark snapshots by @app/github-actions in #8113
fix: Update benchmark snapshots by @app/github-actions in #8114
fix: Update benchmark snapshots by @app/github-actions in #8116
Update test Spicepods by @lukekim in #8131
Add validation to reference schema by @lukekim in #8111
Include root error when failing to find latest timestamp in accelerated table by @sgrebnov in #8132
fix: HTTP Connector validation, query and body by @lukekim in #8115
Update nsql model list by @lukekim in #8141
Update DynamoDB Benchmarks by @krinart in #8135
Fix Dremio E2E test by @sgrebnov in #8139
fix: Update MongoDB benchmark snapshots by @app/github-actions in #8143
fix: Update DynamoDB benchmark snapshots by @app/github-actions in #8142
fix: Update benchmark snapshots by @app/github-actions in #8145
fix: Update iceberg[catalog] benchmark snapshots by @app/github-actions in #8144
Improve HTTP Connector UX by @lukekim in #8146
QueryOverrides for DynamoDB benchmarks by @krinart in #8151
test-framework: add row count validation skipping with TPC-DS defaults by @sgrebnov in #8149
fix: Update benchmark snapshots by @app/github-actions in #8148
fix: Update benchmark snapshots by @app/github-actions in #8154
fix: Update benchmark snapshots by @app/github-actions in #8155
fix: Update test snapshots by @app/github-actions in #8160
Suppress delta_kernel::listed_log_files warnings by @phillipleblanc in #8158
Update table providers to fix warning by @phillipleblanc in #8156
Suppress MCP limit log by @phillipleblanc in #8159
Remove incorrect tool name validation by @Jeadie in #8161
Disable results validation for federated/glue[csv].yaml by @phillipleblanc in #8163
fix: Update benchmark snapshots by @app/github-actions in #8164
Fix dynamodb overrides by @phillipleblanc in #8165
Fix dynamo db overrides again by @phillipleblanc in #8166
few more dynamodb overrides by @phillipleblanc in #8167
Add stub release notes for v1.9.0-rc.4 by @phillipleblanc in #8168
Add v1.9.0-rc.4 release notes by @lukekim in #8169
fix: Cayenne concurrent table creation by @lukekim in #8176
fix: Avoid pruning bucket partitions for != and NOT IN operators by @sgrebnov in #8177

Spice v1.9.0-rc.4 (Nov 18, 2025)

November 18, 2025 · 22 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.9.0-rc.4! 🌶

This release candidate brings DuckDB v1.4.2, Cayenne partitioning improvements, and comprehensive security hardening across the CLI, data connectors, runtime, and MCP. v1.9.0-rc.4 also includes MySQL and PostgreSQL connector improvements with fixed nullability inferences and full-text search support, DynamoDB consistency improvements, HTTP connector validation and UX enhancements, and numerous reliability and performance optimizations. Significant improvements were also made to test and automation infrastructure to ensure high quality releases.

v1.9.0 introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations, and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0 also upgrades to DataFusion v50 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, and delivers many additional features and improvements.

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Key Features:

SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
  - from: s3:my_table
    name: accelerated_data_30d
    acceleration:
      enabled: true
      engine: cayenne
      mode: file
      refresh_mode: append
      retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Architecture:

A distributed Spice cluster consists of:

Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
The feature is in preview and may have stability or performance limitations
Specific acceleration support is planned for future releases

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

See the Apache DataFusion 50.0.3 Release for more details.

DuckDB v1.4.2 Upgrade and Accelerator Improvements

DuckDB v1.4.2: DuckDB has been upgraded to v1.4.2, which includes several performance optimizations.

Example configuration:

datasets:
  - from: file://data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      indexes:
        '(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

Example configuration:

datasets:
  - from: file://sales_data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      params:
        query_federation: disabled # Required currently for intermediate materialization
      indexes:
        '(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
  SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/large_table/
    name: partitioned_data
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      retention:
        sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

HTTP Data Connector

Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.
Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table
Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      max_retries: 3
      client_timeout: 10s

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_sql: |
        SELECT request_path, request_query, content 
        FROM tvmaze
        WHERE request_path = '/search/people'
          AND request_query IN ('q=michael', 'q=luke')

DynamoDB Data Connector Improvements

Example Spicepod.yml configuration:

datasets:
  - from: dynamodb:my_table
    name: ddb_data
    params:
      scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

S3 Versioning Support

Atomic Range Reads for Versioned Files: Spice now supports S3 Versioning for all connectors using object-store (S3, Delta Lake, etc.), ensuring range reads over versioned files are atomically correct. When S3 versioning is enabled, Spice automatically tracks version IDs during file discovery and uses them for all subsequent range reads, preventing inconsistencies from concurrent file modifications.

Current limitations:

Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
Version tracking is automatic when S3 versioning is enabled on the bucket

Search & Embeddings Enhancements

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
  - name: aggregated_reviews
    sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
    embeddings:
      - column: review_text
        model: openai:text-embedding-3-small

Dedicated Query Thread Pool (Now Enabled by Default)

This feature was opt-in in previous releases and is now enabled by default. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
  params:
    dedicated_thread_pool: none

Query Performance Optimizations

Query Result Cache: Stale-While-Revalidate

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

Immediately return the stale cached result to the client
Asynchronously re-execute the query in the background to refresh the cache
Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
Background revalidation re-executes queries through the normal query path
Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

Security & Reliability Improvements

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Key features:

Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
Configurable retry limits: Supports up to 300 retry attempts with a maximum retry interval of 600 seconds
Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Example Spicepod.yml configuration:

datasets:
  - from: git:https://github.com/myorg/myrepo
    name: git_metrics
    params:
      file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK have been upgraded with support configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
    .withArrowMemoryLimitMB(1024) // 1GB limit
    .build();

CLI Improvements

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

New REPL Commands:

.clear - Clear the screen using ANSI escape codes for a clean workspace
.clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear          # Clears the screen
sql> .clear history  # Clears command history
sql> # Use arrow keys or tab to access previous commands

Additional Improvements & Bug Fixes

Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
Reliability: Fixed vector dimension determination for partitioned indexes.
Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
Search: Fixed search field handling as metadata for chunked search indexes.
Validation: Added timestamp support for partition expressions.
Validation: Fixed regexp_match function for DuckDB datasets.
Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0-rc.4, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0-rc.4 image:

docker pull spiceai/spiceai:1.9.0-rc.4

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

DataFusion: Upgraded to v50
Apache Arrow: Upgraded to v56
DuckDB: Upgraded to v1.4.2
Delta Kernel: Upgraded to v0.16.0

Changelog (rc.4)

Upgrade shutdown signal Ordering by @krinart in #8017
Set max-age: 0 during stale by @lukekim in #8018
Add E2E test release for Helm by @lukekim in #8023
Update generate_changelog script by @krinart in #8028
[CLI] Improve auto-complete by @lukekim in #8022
Improve verify helm workflow by @lukekim in #8024
fix: Ensure Vortex UncompressedSizeInBytes is calculated by @peasee in #8044
WW README Update by @wyattwenzel in #8058
Reenable dynamic federation support by @Jeadie in #8026
fix: Prevent SortExec from ordering below SchemaCastScanExec by @peasee in #8061
Skip logging and return OK() on error during shutdown by @krinart in #8057
Partition pruning with complex expressions by @lukekim in #8040
Make DynamoDB snapshots consistent by @krinart in #8069
Add check for error log by @krinart in #8070
Fix tracing of 's3_vector_query_and_get' by @Jeadie in #8065
DuckDB v1.4.2 by @mach-kernel in #8073
Fix failing OpenAI test by @krinart in #8076
Enable 'test_recency_scoring' by @Jeadie in #8068
Test operator: avoid duplicate Flight requests when using --http-clients by @sgrebnov in #8071
Update load tests to use truth percentile values by @sgrebnov in #8079
Update DynamoDB to RC by @krinart in #8060
CachedQueryVector to avoid recomputing embedding vector for spilling/partitioned vector indexes. by @Jeadie in #8059
Fix DuckDB on_commit sink race by @lukekim in #8081
Add partitioned duckdb by @lukekim in #8083
[CLI] Security and santization by @lukekim in #8082
Fix partition_by expression by @lukekim in #8087
Data Components security fixes and sanitization by @lukekim in #8086
Runtime security and sanitization by @lukekim in #8088
Add spicepod-validator tool and fix spicepods by @lukekim in #8089
Skip data fetches for S3 single file refreshes by @lukekim in #8072
MCP security and sanitization by @lukekim in #8090
Add install-dev target back to Makefile by @Jeadie in #8100
fix 'testoperator run search' by @Jeadie in #8101
Update datafusion-table-providers - fix nullability inferences for MySQL and PostgreSQL, and fix full text search for PostgreSQL by @ewgenius in #8092
Remove duplicate install-with-models by @phillipleblanc in #8107
Improve Cayenne partitioning by @lukekim in #8097
Testoperator dispatch: respect verify_results dispatch configuration by @sgrebnov in #8106
Include 'match' column only if chunk offsets found in seach query 'LogicalPlan' by @Jeadie in #8102
Fix validation path by @lukekim in #8109
Fix dispatch paths by @lukekim in #8110
Fix dispatch spicepod paths by @lukekim in #8112
Update test Spicepods by @lukekim in #8131
Add validation to reference schema by @lukekim in #8111
Include root error when failing to find latest timestamp in accelerated table by @sgrebnov in #8132
fix: HTTP Connector validation, query and body by @lukekim in #8115
Update nsql model list by @lukekim in #8141
Update DynamoDB Benchmarks by @krinart in #8135
Fix Dremio E2E test by @sgrebnov in #8139
Improve HTTP Connector UX by @lukekim in #8146
QueryOverrides for DynamoDB benchmarks by @krinart in #8151
test-framework: add row count validation skipping with TPC-DS defaults by @sgrebnov in #8149
Suppress delta_kernel::listed_log_files warnings by @phillipleblanc in #8158
Update table providers to fix warning by @phillipleblanc in #8156
Suppress MCP limit log by @phillipleblanc in #8159
Remove incorrect tool name validation by @Jeadie in #8161
Disable results validation for federated/glue[csv].yaml by @phillipleblanc in #8163
Fix dynamodb overrides by @phillipleblanc in #8165
Fix dynamo db overrides again by @phillipleblanc in #8166
few more dynamodb overrides by @phillipleblanc in #8167
Add stub release notes for v1.9.0-rc.4 by @phillipleblanc in #8168

Spice v1.9.0-rc.2 (Nov 11, 2025)

November 11, 2025 · 32 min read

Sergei Grebnov

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.9.0-rc.2! 🌶

This is the second release candidate for v1.9.0, which introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0-rc.2 also upgrades to DataFusion v50 and DuckDB v1.4.1 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, includes significant DynamoDB and DuckDB accelerator improvements, expands the HTTP data connector to support endpoints as tables, and delivers many security and reliability improvements.

What's New in v1.9.0-rc.2

Cayenne Data Accelerator (Beta)

Key Features:

SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
  - from: s3:my_table
    name: accelerated_data_30d
    acceleration:
      enabled: true
      engine: cayenne
      mode: file
      refresh_mode: append
      retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Architecture:

A distributed Spice cluster consists of:

Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
The feature is in preview and may have stability or performance limitations
Specific acceleration support is planned for future releases

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

See the Apache DataFusion 50.0.0 Release for more details.

DuckDB v1.4.1 Upgrade and Accelerator Improvements

DuckDB v1.4.1: DuckDB has been upgraded to v1.4.1, which includes several performance optimizations.

Example configuration:

datasets:
  - from: file://data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      indexes:
        '(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

Example configuration:

datasets:
  - from: file://sales_data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      params:
        query_federation: disabled # Required currently for intermediate materialization
      indexes:
        '(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
  SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/large_table/
    name: partitioned_data
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      retention:
        sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

HTTP Data Connector

Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.
Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table
Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      max_retries: 3
      client_timeout: 10s

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_sql: |
        SELECT request_path, request_query, content 
        FROM tvmaze
        request_path = '/search/people'
          AND request_query IN ('q=michael', 'q=luke')

DynamoDB Data Connector Improvements

Example Spicepod.yml configuration:

datasets:
  - from: dynamodb:my_table
    name: ddb_data
    params:
      scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

S3 Versioning Support

Current limitations:

Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
Version tracking is automatic when S3 versioning is enabled on the bucket

Search & Embeddings Enhancements

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
  - name: aggregated_reviews
    sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
    embeddings:
      - column: review_text
        model: openai:text-embedding-3-small

Dedicated Query Thread Pool (Now Enabled by Default)

This feature was opt-in in previous releases and is now enabled by default in v1.9.0-rc.2. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
  params:
    dedicated_thread_pool: none

Query Performance Optimizations

Query Result Cache: Stale-While-Revalidate

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

Immediately return the stale cached result to the client
Asynchronously re-execute the query in the background to refresh the cache
Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
Background revalidation re-executes queries through the normal query path
Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

Security & Reliability Improvements

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Key features:

Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
Configurable retry limits: Supports up to 300 retry attempts with a maximum retry interval of 600 seconds
Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Example Spicepod.yml configuration:

datasets:
  - from: git:https://github.com/myorg/myrepo
    name: git_metrics
    params:
      file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK have been upgraded with support configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
    .withArrowMemoryLimitMB(1024) // 1GB limit
    .build();

CLI Improvements

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

New REPL Commands:

.clear - Clear the screen using ANSI escape codes for a clean workspace
.clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear          # Clears the screen
sql> .clear history  # Clears command history
sql> # Use arrow keys or tab to access previous commands

Additional Improvements & Bug Fixes

Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
Reliability: Fixed vector dimension determination for partitioned indexes.
Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
Search: Fixed search field handling as metadata for chunked search indexes.
Validation: Added timestamp support for partition expressions.
Validation: Fixed regexp_match function for DuckDB datasets.
Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0-rc.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0-rc.2 image:

docker pull spiceai/spiceai:1.9.0-rc.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

DataFusion: Upgraded to v50
Apache Arrow: Upgraded to v56
DuckDB: Upgraded to v1.4.1
Delta Kernel: Upgraded to v0.16.0

Changelog

Fix for search field as metadata for chunked search indexes by @Jeadie in #7429
Bump object_store from 0.12.3 to 0.12.4 by @app/dependabot in #7433
Properly respect disabling snapshots by @phillipleblanc in #7431
Revert "Properly respect disabling snapshots" by @sgrebnov in #7439
Revert "Disable snapshots by default" by @sgrebnov in #7438
Add preview warning for write access mode by @sgrebnov in #7440
fix: regexp_match for DuckDB datasets by @kczimm in #7443
Add feature is currently in preview warning for snapshots by @sgrebnov in #7442
[Logger] Also emit Datafusion logs by @mach-kernel in #7441
add missing snapshot by @kczimm in #7446
Fix tracing so that ai_completions are parented under sql_query by @lukekim in #7415
Enable snapshot acceleration by default by @phillipleblanc in #7451
Disable acceleration refresh metrics by @krinart in #7450
Add v1.8 release notes by @phillipleblanc in #7430
fix: partition name validation by @kczimm in #7452
Fix lint error due to ignore without reasons by @krinart in #7454
Add models and CUDA support to spiced install script by @lukekim in #7457
Post-release 1.8 updates by @phillipleblanc in #7455
Remove println in datafusion by @phillipleblanc in #7461
Update end_game.md to notify once release is done by @sgrebnov in #7460
Remove italics from snapshot logging by @phillipleblanc in #7463
Update openapi.json by @app/github-actions in #7466
Fix generate spicepod schema by @phillipleblanc in #7464
Fix generate acknowledements by @phillipleblanc in #7465
Update spicepod.schema.json by @app/github-actions in #7469
fix: Ensure ListingTable partitions are pruned when filters are not used by @peasee in #7471
Create runtime-secrets crate by @phillipleblanc in #7474
Create runtime-parameters crate by @phillipleblanc in #7475
Don't download the snapshot if the acceleration is present by @phillipleblanc in #7477
Fix casing for keywords and additional columns by @Jeadie in #7770
Bump actions/upload-artifact from 4 to 5 by @app/dependabot in #7750
Bump criterion from 0.5.1 to 0.7.0 by @app/dependabot in #7740
Bump rustls-native-certs from 0.8.1 to 0.8.2 by @app/dependabot in #7744
Git Data Connector (Alpha) by @lukekim in #7772
Pepper accelerator delete support by @lukekim in #7616
Update Helm chart instructions for Helm in end_game.md by @sgrebnov in #7776
Turso data accelerator by @lukekim in #7472
Apply retention SQL filter to refresh fetch by @phillipleblanc in #7778
Add Parquet buffering option for DuckDB partitioned writes (tables mode) by @sgrebnov in #7735
fix: EmptyExec when list indexes is empty by @kczimm in #7784
1.8.3 post-release housekeeping by @mach-kernel in #7783
feat: Upgrade to Datafusion v50 by @peasee in #7777
fix: Replace vortex datafusion with public crate by @peasee in #7791
Full-text search on views by @Jeadie in #7733
Revert "Apply retention SQL filter to refresh fetch (#7778)" by @phillipleblanc in #7796
fix: Add ingest duration and acceleration size metrics to testoperator by @peasee in #7792
Set local timezone to UTC for DuckDB by @phillipleblanc in #7797
add Timestamp support for partition expressions by @kczimm in #7803
Fix trunk lint by @krinart in #7804
Add missing mongodb params by @krinart in #7807
Embedding columns on view components by @Jeadie in #7795
Add Turso as a Pepper Catalog metastore by @lukekim in #7793
Run retention_sql on refresh commit for DuckDB by @lukekim in #7785
docs: Update datafusion upgrade checklist by @peasee in #7812
Vector engines on views by @Jeadie in #7808
Handle refresh worker panics and add recovery test by @phillipleblanc in #7815
chunk large record batches to control memory usage by @kczimm in #7802
fix: cannot determine vector dimension for partitioned indexes by @kczimm in #7818
Upgrade to Turso v0.3 by @lukekim in #7821
fix: Ensure custom *Exec ExecutionPlans push down dynamic filters by @peasee in #7811
handle casing in RRF by @Jeadie in #7825
Enable 'turso' for pepper acceleration by default by @sgrebnov in #7826
Improved DynamoDB Data Connector by @krinart in #7715
Initial support for llama.cpp as LLM inference backend by @lukekim in #7794
Pepper: Implement retention SQL on refresh commit by @phillipleblanc in #7814
Fix Dockerfiles for arm64 by @lukekim in #7834
[DynamoDB] Handle filter edge-cases by @krinart in #7830
[DynamoDB] Support parallelization for Scan request by @krinart in #7829
Don't feature gate Pepper by @lukekim in #7832
Fix llama.cpp static link by @lukekim in #7835
fix: docker nightly builds by @kczimm in #7837
Use GitHub-hosted macOS runner only for tag releases by @lukekim in #7836
Fix Bug: DuckDB INTERNAL Error: Failed to load metadata pointer by @sgrebnov in #7839
Fix docker arm64 build to use aegis in pure-rust mode by @lukekim in #7840
Revert "Use GitHub-hosted macOS runner only for tag releases" by @lukekim in #7843
Rename Pepper to Cayenne by @lukekim in #7844
Tighten CLI permissions and install script by @lukekim in #7845
Set mvcc for Cayenne Turso metastore by @lukekim in #7850
Optimize Prepared Statements by @lukekim in #7859
Remove unwrap from ODBC connector, fix secrets, and kuberenetes secre… by @lukekim in #7846
Improve and secure HTTP client usage by @lukekim in #7847
Pin Oracle Instant Client download to a SHA by @lukekim in #7851
Improve experience for missing or invalid Spicepod.yaml by @lukekim in #7849
chore: Fix PR linting by @peasee in #7865
Revert FlightIPC issues by @Jeadie in #7870
Improve error message by adding 'cayenne' to the list of valid accelerator engines by @sgrebnov in #7882
fix: allow parameter index without dollar signs by @kczimm in #7887
Temporary disable supports_limit_pushdown for SchemaCastScanExec by @sgrebnov in #7893
Remove '.embeddings[].metadata' by @Jeadie in #7897
Optimize macOS and Windows builds by @lukekim in #7863
fix: Kafka message delivery failed by @kczimm in #7883
docs: Update component criteria by @peasee in #7891
fix: Make integration run with no relevant changes, disable makefile targets on PR by @peasee in #7896
Add Cayenne benchmark and concurrency tests and remove indexes for Turso MVCC by @lukekim in #7879
Revert llama.cpp engine by @lukekim in #7898
Make Cayenne snapshotting more robust by @sgrebnov in #7899
Release notes v1.9.0-rc1 by @Jeadie in #7902
Fix dataset_acceleration_last_refresh_time_ms unit to milliseconds in description by @ewgenius in #7901
Fix lint error in record_explain_plan functionality by @sgrebnov in #7906
Cleanup old snapshots after full refresh by @lukekim in #7908
Cayenne deletion vector caching support by @lukekim in #7903
Split filters into partition filters (for pruning) and data filters by @lukekim in #7889
fix: Update benchmark snapshots by @app/github-actions in #7911
fix: Update benchmark snapshots by @app/github-actions in #7912
fix: Update benchmark snapshots by @app/github-actions in #7913
Update spicepod.schema.json by @app/github-actions in #7916
fix: Update benchmark snapshots by @app/github-actions in #7917
Add Cayenne & Turso accelerators to E2E CI test matrix by @lukekim in #7922
Make preview warnings consistent by @lukekim in #7921
Filter and write optimizations by @lukekim in #7918
fix: Set sccache region explicitly by @peasee in #7928
fix: Enable integration test merge group checks by @peasee in #7927
Update testoperator release branch from 1.8 to 1.9 by @peasee in #7926
Update DuckDB to 1.4.1 with composite ART scans by @mach-kernel in #7884
Don't build Windows on trunk pushes by @lukekim in #7931
fix: Use correct minio secret in build binary push by @peasee in #7934
Update test-framework workers to use dedicated Flight client by @sgrebnov in #7938
Fix financebench, configure s3vectors for appropriate snapshotting by @Jeadie in #7935
Don't try to initialize accelerator if it is disabled by @lukekim in #7932
Add spark UDFs to Spice by @Jeadie in #7936
Fix extra async_trait in cayenne metadata catalog by @phillipleblanc in #7942
deps: Upgrade to Rust 1.90 by @peasee in #7941
Add cayenne accelerator to README.md by @ewgenius in #7905
fix: Update benchmark snapshots by @app/github-actions in #7948
Run integration tests with AWS_EC2_METADATA_DISABLED flag by @sgrebnov in #7952
Only retry credentials on ConnectorError by @kczimm in #7944
fix: Improve join reordering by ensuring JoinSelection is applied by @peasee in #7828
fix: Remove unused deps, consolidate workspace deps by @peasee in #7953
bump async-openai commit by @kczimm in #7929
deps: Use vortex fork by @peasee in #7954
Enable tracing in delta lake integration tests by @sgrebnov in #7951
Update datasets in S3 vectors test case by @Jeadie in #7956
Add spiced metrics scraping to test operator by @lukekim in #7937
Memoize S3 vectors ListIndex API call with configurable TTL by @kczimm in #7910
Cayenne performance optimizations by @lukekim in #7907
Setup HotFix issue template by @ewgenius in #7957
Fix AWS SDK credential cache retry handling by @phillipleblanc in #7943
Infer RRF join_key from TableProvider::constraints and implement SearchQueryProvider::constraints. by @Jeadie in #7959
[Optimizer]: DuckDB intermediate materialization (non-federated) by @mach-kernel in #7964
1.7.3 post-release housekeeping by @ewgenius in #7962
Fix digest_many UDF for ColumnarValue::Array. by @Jeadie in #7960
Fix spiced metrics reporting as part of benchmark tests by @sgrebnov in #7967
Avoid pushing down Spice specific UDFs to accelerators during federation by @Jeadie in #7909
CLI file persisted history with .clear and .clear history commands by @lukekim in #7970
ResultsCache Cache-Control stale-while-revalidate by @lukekim in #7963
Use GetVectors API instead of returnData by @kczimm in #7083
Make DuckDB intermediate materialization logic more robust by @sgrebnov in #7971
[Cayenne] Configurable target Vortex file size by @lukekim in #7972
fix: Update benchmark snapshots by @app/github-actions in #7974
Bump github.com/klauspost/compress from 1.17.11 to 1.18.1 by @app/dependabot in #7872
fix: Update benchmark snapshots by @app/github-actions in #7978
fix: Update benchmark snapshots by @app/github-actions in #7982
Run Integration tests on spiceai-dev-runners by @sgrebnov in #7985
[CLI] Fix cursor issue due to flush by @lukekim in #7981
fix: Support S3 versioning, Vortex dynamic filter pushdown by @peasee in #7984
Make cluster a default feature by @lukekim in #7994
Optimize DuckDB Intermediate Index Materialization for No-Index Case by @sgrebnov in #7998
HTTP connector with dynamic filter support by @lukekim in #7969
Revert federation 'can_execute_plan' by @Jeadie in #7999
Fix stale caching by @lukekim in #7995
Fix count(*) for http connector by @krinart in #8001
[CLI] Install specific version by @lukekim in #8006
Fix stale with revalidate request/response by @lukekim in #8005
Fallback RequestContext for cluster queries by @Jeadie in #8007
Use use_rustls_tls for Spice Cloud /connect by @lukekim in #8008
Use delta-kernel-rs 0.16x + Parquet reader with object meta API changes by @mach-kernel in #8011
fix: Update datafusion & arrow-rs with S3 versioning fix by @lukekim in #8012

What's New in v1.10.0​

Caching Acceleration Mode​

TinyLFU Cache Eviction Policy​

DynamoDB Streams Data Connector (Preview)​

OpenTelemetry Metrics Exporter​

S3 Connector Improvements​

Faster Distributed Query Execution​

Search Improvements​

Security Hardening​

Developer Experience Improvements​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.10.0-rc1​

Caching Acceleration Mode with SWR and TinyLFU​

DynamoDB Streams Data Connector in Preview​

Cayenne Accelerator Enhancements​

S3 Connector Improvements​

Faster Distributed Query Execution​

Search Improvements​

Security Hardening​

Developer Experience Improvements​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.9.0​

Cayenne Data Accelerator (Beta)​

Multi-Node Distributed Query (Preview)​

DataFusion v50 Upgrade​

DuckDB v1.4.2 Upgrade and Accelerator Improvements​

HTTP Data Connector​

DynamoDB Data Connector Improvements​

S3 Data Connector Improvements​

Search & Embeddings Enhancements​

Dedicated Query Thread Pool (Now Enabled by Default)​

Query Performance Optimizations​

Query Result Caching: Compressed Encoding, Stale-While-Revalidate Cache Control​

Security & Reliability Improvements​

AWS Authentication Improvements​

Observability & Tracing​

Git Data Connector (Alpha)​

Spice Java SDK 0.4.0​

CLI Improvements​

Additional Improvements & Bug Fixes​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

What's New in v1.9.0​

Cayenne Data Accelerator (Beta)​

Multi-Node Distributed Query (Preview)​

DataFusion v50 Upgrade​

DuckDB v1.4.2 Upgrade and Accelerator Improvements​

HTTP Data Connector​

DynamoDB Data Connector Improvements​

S3 Versioning Support​

Search & Embeddings Enhancements​

Dedicated Query Thread Pool (Now Enabled by Default)​

Query Performance Optimizations​

Query Result Cache: Stale-While-Revalidate​

Security & Reliability Improvements​

AWS Authentication Improvements​

Observability & Tracing​

Git Data Connector (Alpha)​

Spice Java SDK 0.4.0​

CLI Improvements​

Additional Improvements & Bug Fixes​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's New in v1.10.0

Caching Acceleration Mode

TinyLFU Cache Eviction Policy

DynamoDB Streams Data Connector (Preview)

OpenTelemetry Metrics Exporter

S3 Connector Improvements

Faster Distributed Query Execution

Search Improvements

Security Hardening

Developer Experience Improvements

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.10.0-rc1

Caching Acceleration Mode with SWR and TinyLFU

DynamoDB Streams Data Connector in Preview

Cayenne Accelerator Enhancements

S3 Connector Improvements

Faster Distributed Query Execution

Search Improvements

Security Hardening

Developer Experience Improvements

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Multi-Node Distributed Query (Preview)

DataFusion v50 Upgrade

DuckDB v1.4.2 Upgrade and Accelerator Improvements

HTTP Data Connector

DynamoDB Data Connector Improvements

S3 Data Connector Improvements

Search & Embeddings Enhancements

Dedicated Query Thread Pool (Now Enabled by Default)

Query Performance Optimizations

Query Result Caching: Compressed Encoding, Stale-While-Revalidate Cache Control

Security & Reliability Improvements

AWS Authentication Improvements

Observability & Tracing

Git Data Connector (Alpha)

Spice Java SDK 0.4.0

CLI Improvements

Additional Improvements & Bug Fixes

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Multi-Node Distributed Query (Preview)

DataFusion v50 Upgrade

DuckDB v1.4.2 Upgrade and Accelerator Improvements

HTTP Data Connector

DynamoDB Data Connector Improvements

S3 Versioning Support

Search & Embeddings Enhancements

Dedicated Query Thread Pool (Now Enabled by Default)

Query Performance Optimizations

Query Result Cache: Stale-While-Revalidate

Security & Reliability Improvements

AWS Authentication Improvements

Observability & Tracing

Git Data Connector (Alpha)

Spice Java SDK 0.4.0

CLI Improvements

Additional Improvements & Bug Fixes

Contributors

Breaking Changes

Cookbook Updates

Upgrading