14 posts tagged with "datafusion"

DataFusion query engine related topics and usage

View All Tags

Spice v2.0-rc.1 (Mar 4, 2026)

March 4, 2026 · 23 min read

Sergei Grebnov

Senior Software Engineer at Spice AI

Announcing the release of Spice v2.0-rc.1! 🚀

v2.0.0-rc.1 is the first release candidate for early testing of v2.0.

Highlights in this release candidate include:

Active-Active Highly-Available Distributed Query that is object-store-native and built on Apache Ballista, with dynamic cluster sizing, distributed ingestion, and cluster observability
Spice Cayenne RC with staged append writes, file-based retention deletes, composite partitioning, and distributed ingestion
DataFusion v52.2.0 Upgrade with sort pushdown, a new merge join, and dynamic filters
DDL Support for CREATE TABLE and DROP TABLE via SQL for Iceberg and Cayenne catalogs
DuckLake Catalog & Data Connector for lakehouse-style data management
GCS Data Connector (Alpha) for Google Cloud Storage
Rust CLI Rewrite for a unified single-binary experience
Dependency upgrades including DuckDB v1.4.4, delta_kernel v0.18.2, and mistral.rs

Spice v2.0 includes several breaking changes. Review the breaking changes section before upgrading.

Distribution Changes

AI/ML support including local LLM/ML model and hosted LLM inference is now included in the default Spice build and image. The separate models build variant has been removed.

With models now included by default, the data-only distribution (without AI/ML support) is only published in nightly builds. Official production-ready data-only distributions are available exclusively through Spice Cloud and the Enterprise release.

A new Network Attached Storage (NAS) distribution with built-in SMB and NFS data connector support is also now available in nightly builds and with Spice.ai Enterprise.

Distribution / Variant	Open Source	Spice Cloud	Enterprise
Default	✅	✅	✅
Data	Nightly only	✅	✅
NAS (SMB + NFS)	Nightly only	❌	✅
Metal (macOS)	✅	✅	✅
CUDA (Linux)	Nightly only	✅	✅
Allocator variants	Nightly only	✅	✅
ODBC connector	Local build only	✅	✅

For more details, see the Distributions documentation.

What's New in v2.0.0-rc.1

Active-Active HA Distributed Query

Distributed Query exits Beta with active-active highly-available object-store-based distributed query.

Distributed query supports two execution modes:

Synchronous: Queries for accelerated datasets are distributed across executors and results are streamed back in real-time. Non-accelerated datasets execute only on the scheduler. Best for interactive queries where low latency is critical.
Asynchronous: Queries are submitted via the new HTTP-only /v1/queries API and results are materialized to object storage for later retrieval. Best for long-running analytical workloads, batch processing, and non-accelerated datasets in distributed mode.

Key improvements:

Dynamic Cluster Sizing: The query planner automatically adjusts parallelism based on the number of active executors in the cluster, ensuring optimal resource utilization as nodes are added or removed.
Distributed Ingestion: Data ingestion for partitioned accelerated tables is now distributed across executor nodes, enabling higher throughput and parallel data loading in cluster mode. Regular (non-partitioned) accelerated tables do not distribute ingestion loads.
Synchronous Execution on Scheduler: /v1/sql and FlightSQL queries now execute synchronously on the scheduler when appropriate, reducing inter-node overhead for queries that don't benefit from distribution.
Faster Failure Detection: Executor heartbeat timeout reduced from 180s to 30s, enabling the cluster to quickly detect and respond to executor failures.
Cluster Observability: New metrics and Grafana dashboard for monitoring distributed query clusters.

Spice Cayenne Improvements

The Spice Cayenne data accelerator exits Beta with significant reliability and performance improvements:

Staged Append Writes: WAL-based staged append writes prevent partial writes and data loss on stream errors. Batches are written to a WAL file before being committed, ensuring atomicity.
File-Based Retention Deletes: Time-based retention now supports file-level deletes for both position-based and primary-key tables, reducing I/O overhead compared to row-level deletion.
Multiple Partition Expressions: Support for composite partitioning with partition_by: [col1, col2] using hierarchical path-like keys (e.g., 2025/10/15).
Distributed Ingestion: Cayenne catalog now supports distributed ingestion across executor nodes in cluster mode, including UPDATE operations.
Improved Robustness: Fixed CDC edge case where DELETE + UPSERT sequences could produce duplicate primary keys across protected snapshots. Improved upsert handling during runtime restarts.

DataFusion v52.2.0 Upgrade

Apache DataFusion has been upgraded to v52.2.0, bringing significant performance improvements, new query features, and enhanced extensibility.

Performance Improvements:

Faster CASE Expressions: Lookup-table-based evaluation for certain CASE expressions avoids repeated evaluation, accelerating common ETL patterns
MIN/MAX Aggregate Dynamic Filters: Queries with MIN/MAX aggregates now create dynamic filters during scan to prune files and rows as tighter bounds are discovered during execution
New Merge Join: Rewritten sort-merge join (SMJ) operator with speedups of three orders of magnitude in pathological cases (e.g., TPC-H Q21: minutes → milliseconds)
Caching Improvements: New statistics cache for file metadata avoids repeatedly recalculating statistics, significantly improving planning time. A prefix-aware list-files cache accelerates evaluating partition predicates for Hive partitioned tables
Improved Hash Join Filter Pushdown: Build-side hash map contents are now passed dynamically to probe-side scans for pruning files, row groups, and individual rows

Major Features:

Sort Pushdown to Scans: Sorts are pushed into data sources, enabling ~30x performance improvement on pre-sorted data with top-K queries. Parquet scans now reverse row group order for DESC queries on ASC-sorted files
TableProvider supports DELETE and UPDATE: New hooks for DELETE and UPDATE statements in the TableProvider trait, enabling Iceberg and Cayenne connectors to implement SQL DELETE and UPDATE operations
More Extensible SQL Planning: New RelationPlanner API for extending SQL planning for FROM clauses, enabling support for vendor-specific SQL dialects

DDL Support for Iceberg and Cayenne

SQL Schema Management: Spice now supports CREATE TABLE and DROP TABLE DDL operations for Iceberg and Cayenne catalogs via FlightSQL and the /v1/sql API. DML validation has been updated for catalog-level writability.

DuckLake Catalog & Data Connector

Lakehouse-Style Data Management: New DuckLake catalog and data connector enable lakehouse-style data management with DuckDB as the metadata catalog and object storage for data files. DuckLake provides ACID transactions, time travel, and schema evolution on top of Parquet files.

GCS Data Connector (Alpha)

Google Cloud Storage Support: New Google Cloud Storage data connector enables federated queries against data stored in GCS buckets, with Iceberg table support.

Rust CLI Rewrite

Unified Single-Binary Experience: The Spice CLI has been completely rewritten from Go to Rust, eliminating the Go dependency and providing a single spice binary built from the same codebase as spiced. This improves startup performance, reduces distribution size, and ensures consistent behavior between CLI and runtime.

Key Features:

Full Feature Parity: All 27+ CLI commands re-implemented in Rust with identical behavior
New spice query Command: Interactive REPL for async queries via the /v1/queries API with multi-line SQL input, spinner progress indicator, Ctrl+C cancellation, and partial query ID matching
--output=json Flag: Machine-readable JSON output for CLI commands, enabling scripting and automation
spice login --output: New output modes (env, json, keychain) for flexible credential management
spice cloud metrics: New command for Spice Cloud deployment metrics

Models Included by Default

Local LLM/ML model inference (via mistral.rs) is now included in the default Spice build. The separate models build variant has been removed. This simplifies installation and ensures all users have access to local AI inference capabilities.

Error Propagation for Dataset and Model Status APIs

The /v1/datasets and /v1/models APIs now return structured error information when a component is in an Error state. The ?status=true query parameter must be passed to retrieve the real-time component status, including the error state and details. Previously, the status field only indicated Error with no further detail. Now, two new fields are included when ?status=true is specified:

error: A structured object with category, type, and code fields for programmatic error handling (e.g. { "category": "dataset", "type": "auth", "code": "dataset.auth" }).
error_message: A human-readable description of why the component entered an error state.

These fields are only present when ?status=true is passed and the component is in an error state.

Example /v1/datasets?status=true response:

[
  {
    "from": "postgres:syncs",
    "name": "daily_journal",
    "replication_enabled": false,
    "acceleration_enabled": true,
    "status": "Ready"
  },
  {
    "from": "databricks:hive_metastore.default.messages",
    "name": "messages",
    "replication_enabled": false,
    "acceleration_enabled": true,
    "status": "Error",
    "error": {
      "category": "dataset",
      "type": "auth",
      "code": "dataset.auth"
    },
    "error_message": "Unable to authenticate with datasource credentials"
  }
]

The spice datasets and spice models CLI commands now include an ERROR column that displays the error message for any component in an error state.

Additional Dependency Upgrades

Dependency	Version
Ballista	v52.0.0
DuckDB	v1.4.4
delta_kernel	v0.18.2
mistral.rs	v0.7.0 (candle fork removed, now uses candle 0.9.2 from crates.io)
Turso (libsql)	v0.4.4
Vortex	Upgraded with CASE-WHEN support
AWS SDK	Multiple crates updated + APN user-agent support

Other Improvements

Spicepod v2 Support: Spicepods now support version v2, and spice init generates spicepod.yaml files with version: v2 by default while maintaining backward compatibility for existing v1 spicepods.
x.ai Models: x.ai models now exclusively use the /v1/responses endpoint with rate limiting support.
HuggingFace Chat Templates: Added support for chat templates in HuggingFace model configurations.
Databricks SQL Dialect: Added Databricks SQL dialect for DataFusion unparser, improving federation query generation.
Snowflake: Added snowflake_private_key parameter for key-pair authentication.
Acceleration Metrics: New rows_written, bytes_written, and dataset_acceleration_size_bytes metrics for acceleration refresh ingestion.
Refresh SQL UDFs: Core scalar UDFs are now enabled in refresh SQL expressions.
FlightSQL: Fixed TLS connection handling for grpc+tls:// endpoints with custom CA certificate support.
FlightSQL: Fixed schema consistency by expanding view types and verifying field names.
Hash Index: Fixed query correctness when hash index is used with additional filters.
Results Cache: Fixed schema preservation for empty query results.
Query Nullability: Reconciled execution stream nullability with logical plan schema.
Schema Evolution: Graceful handling of schema evolution mismatch errors during data refresh.
Internal YAML Parser: Replaced deprecated serde_yaml with an internal YAML implementation.

Spicepod v1 to v2 Changes

Spicepod v2 introduces configuration improvements while maintaining backward compatibility with v1. Existing v1 spicepods continue to work — deprecated fields are automatically migrated at load time.

Version support:

Version	Status
`v2`	Default. Used by `spice init`.
`v1`	Supported. Deprecated fields auto-migrate.
`v1beta1`	Removed. No longer accepted.

Configuration changes:

v1 (deprecated)	v2 (preferred)	Notes
`runtime.results_cache`	`runtime.caching.sql_results`	All fields migrate automatically. `cache_max_size` → `max_size`.
`runtime.memory_limit`	`runtime.query.memory_limit`	Auto-migrated. `query.memory_limit` takes priority if both set.
`runtime.temp_directory`	`runtime.query.temp_directory`	Auto-migrated. `query.temp_directory` takes priority if both set.
`dataset.invalid_type_action`	`dataset.unsupported_type_action`	Auto-migrated. v2 adds a new `string` variant.

New v2 fields:

runtime.ready_state — Controls when the runtime reports ready (on_load default, or on_registration).
runtime.flight.do_put_rate_limit_enabled — Enable/disable FlightSQL DoPut rate limiting (default: true).
runtime.query.spill_compression — Compression for query spill files (e.g., lz4_frame).
runtime.scheduler.partition_management — Configure partition assignment interval, limits, and timeouts for distributed mode.
runtime.caching.sql_results.stale_while_revalidate_ttl — Serve stale cached results while revalidating in the background.
runtime.caching.sql_results.encoding — Cache entry compression (e.g., zstd).
catalog.access: read_write_create — New access mode for catalogs that support DDL operations.

Migration note: When both the deprecated v1 field and its v2 equivalent are set, the v2 field takes priority.

Contributors

Breaking Changes

Cayenne and Distributed Query exit Beta: Beta warnings have been removed from documentation and code. Both features are now considered GA-ready.
Models included by default: The separate models build variant has been removed. Local LLM inference is now always included.
Spicepod version defaults to v2: New spicepods created with spice init now default to version: v2. Existing v1 spicepods remain supported, and v1beta1 is no longer accepted.
Windows native builds removed: Native Windows builds are no longer provided. Use WSL for local development instead.
Metric renames: accelerated_refresh metrics renamed to acceleration_refresh for consistency. last_refresh_time gauge renamed to include milliseconds unit.
Caching config renamed: ResultsCache replaced with SQLResultsCacheConfig in configuration.
DuckDB parameter rename: partitioned_write_flush_threshold renamed to partitioned_write_flush_threshold_rows.
v1/search API: The /v1/search API now always returns an array in matches, even for single results.
x.ai model endpoint: x.ai models now exclusively use the /v1/responses endpoint.
Error messages: Error messages across S3 Vectors, ScyllaDB, Snowflake, ClickHouse, and other components have been refactored for clarity and consistency.

Cookbook Updates

New and updated Spice Cookbook recipes:

Async Queries: Submit long-running queries asynchronously and retrieve results later.
DuckLake Catalog Connector: Use DuckLake for lakehouse-style data management with ACID transactions and time travel.

The Spice Cookbook includes 88 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v2.0.0-rc.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:2.0.0-rc.1 image:

docker pull spiceai/spiceai:2.0.0-rc.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai --version 2.0.0-rc.1

AWS Marketplace:

Spice is available in the AWS Marketplace.

What's Changed

Changelog

Add TPC-DS integration tests with S3 source and PostgreSQL acceleration by @phillipleblanc in #9006
fix(tests): fix flaky/slow/failing unit tests by @phillipleblanc in #9009
fix: Update benchmark snapshots for DF51 upgrade by @app/github-actions in #9008
fix: add feature gate to rrf TEST_EMBEDDING_MODEL by @phillipleblanc in #9017
fix: features check by @phillipleblanc in #9014
fix: Enable Cayenne acceleration snapshots by @lukekim in #9020
URL table support by @lukekim in #9018
ScyllaDB key filter by @lukekim in #8997
fix: Schema mismatch when using column projection with HTTP caching by @phillipleblanc in #9021
Add more tests for HTTP caching with columns selection by @sgrebnov in #9025
HTTP cache snapshots: default to time_interval and fix snapshots_creation_policy: on_change by @sgrebnov in #9026
Fix duplicate snapshot creation on startup by @sgrebnov in #9029
Add ScyllaDB and SMB to the README table by @krinart in #9034
Remove waiting for runtime to be ready before creating snapshot by @krinart in #9033
Fix snapshot on_change policy to skip when no writes occurred by @sgrebnov in #9028
Release notes for release release/1.11.0-rc.2 by @krinart in #9016
ci: use arduino/setup-protoc for official protobuf compiler by @phillipleblanc in #9036
ci: install unzip on aarch64 runner for arduino/setup-protoc by @phillipleblanc in #9038
fix: don't fail release if upload to minio fails by @phillipleblanc in #9039
Add missing protoc step to setup-cc action by @krinart in #9041
fix: Update Search integration test snapshots by @app/github-actions in #9013
Fix formula_1 and codebase_community in bird-bench by @Jeadie in #9000
Cayenne S3 Express One Zone improvements by @lukekim in #9015
Add zlib1g-dev to CI by @lukekim in #9052
Improve validation and logging for hash indexes by @lukekim in #9047
Upgrade Vortex with CASE-WHEN by @lukekim in #9051
x.ai models now exclusively use /v1/responses endpoint by @lukekim in #9400
Improvements for snapshot schema comparison by @krinart in #9401
v2.0 breaking changes by @lukekim in #9233
Create PartitionManagementTask for scheduler to update accelerated table partition assignments by @Jeadie in #9378
refactor(Cayenne): route all write orchestration through CayenneDataSink by @sgrebnov in #9402
Refactor benchmark to use QueryExecutor trait by @Jeadie in #9418
feat: Add spidapter build and release workflow by @peasee in #9427
Testoperator: add support for api-key when connecting to external spice instance by @sgrebnov in #9421
Initial implementation of Ducklake catalog & data connectors by @lukekim in #9083
Require aws_lc_rs since jsonwebtoken upgrade by @Jeadie in #9426
feat: Add spidapter tool by @peasee in #9425
Add release notes for 1.11.2 patch release by @sgrebnov in #9430
feat(spidapter): integrate system-adapter-protocol with SCP provisioning by @phillipleblanc in #9434
Add DuckLake TPCH E2E workflow and federated Spicepod configuration by @lukekim in #9431
fix(spidapter): use Flight handshake auth instead of x-api-key header by @phillipleblanc in #9435
[spidapter] Keep only what sparks joy by @Jeadie in #9439
Refactor binary operator balancing by @Jeadie in #9424
feat: Add Iceberg DDL support (CREATE TABLE / DROP TABLE) for default catalog override by @phillipleblanc in #9440
Fix Flight SQL schema consistency: expand view types and verify field names by @sgrebnov in #9438
Update spidapter for new system-adapter-protocol by @sgrebnov in #9442
docs: fix typos and syntax errors in style guide and error handling docs by @cluster2600 in #9445
Add acceleration refresh ingestion metrics (rows_written, bytes_written) by @phillipleblanc in #9461
Refactor(Cayenne): Replace CatalogError and string based errors with Snafu errors by @sgrebnov in #9403
Replace deprecated claude-3-5-haiku-latest with claude-haiku-4-5 by @Jeadie in #9492
Fix #9481: Preserve schema in results cache for empty query results by @phillipleblanc in #9485
Fix partition by serializing by @Jeadie in #9474
query: reconcile execution stream nullability with logical plan schema by @phillipleblanc in #9486
initial spice-cloud-client crate and spice cloud metrics --app <app-name>. by @Jeadie in #9480
feat: Return dataset error message in datasets API by @peasee in #9487
Spicebench by @lukekim in #9447
build(deps): consolidate dependabot dependency updates by @phillipleblanc in #9504
fix(cluster): route non-partitioned accelerated tables in distributed mode by @phillipleblanc in #9508
Enable core scalar UDFs in refresh SQL by @sgrebnov in #9502
Fix metrics in Spidapter again by @Jeadie in #9497
fix(cluster): tolerate Completed->status propagation race in distributed query handle by @phillipleblanc in #9510
feat: Support distributed ingestion in cayenne catalog by @peasee in #9506
Fix Cayenne duplicate primary keys after DELETE + UPSERT CDC sequences by @krinart in #9494
fix(cluster): rewrite table scans inside subqueries for distributed execution by @phillipleblanc in #9518
fix: Set catalog mode to readwritecreate in spidapter by @peasee in #9519
Upgrade AWS SDK crates & set APN user-agent in AWS SDK credential bridge by @lukekim in #8328
feat(runtime): add runtime ready_state on_registration semantics by @lukekim in #9522
fix: Add spidapter post-setup retries by @peasee in #9526
Make partition discovery more robust and make initialization non-blocking by @sgrebnov in #9499
Make lint-rust-fix support targeted packages and features by @Jeadie in #9511
Handle new Cloud SCP API by @Jeadie in #9532
Refactor and simplify streaming benchmarks by @krinart in #9405
fix: ensure spidapter only increments attempts on failures by @peasee in #9534
feat: Support specifying app resources in spidapter by @peasee in #9536
test(runtime): Spice Cayenne DDL integration test by @lukekim in #9535
fix: Handle schema evolution mismatch errors during data refresh by @lukekim in #9527
fix: resolve clippy lint warnings by @phillipleblanc in #9547
pr-builds --tag <TAG> for build_and_release.yml by @Jeadie in #9507
Add --output flag to spice login with env/json/keychain modes by @Jeadie in #9541
Don't use 'PartitionedTableScanRewrite' in async distributed query by @Jeadie in #9548
feat(spidapter): add local backend mode with single executor by @phillipleblanc in #9531
support chat template in HF by @Jeadie in #9543
fix(cayenne): stream PK retention deletes and run OOM regression in CI by @phillipleblanc in #9533
cayenne: Staged append writes to prevent partial writes and data loss on stream error by @sgrebnov in #9491
AcceleratedTable::scan use FederatedTable::scan when ClusterRole::Scheduler by @Jeadie in #9550
Upgrade to delta-kernel-rs v0.18.2 by @lukekim in #9528
Run cayenne tests as part of PR CI by @sgrebnov in #9554
Upgrade to DataFusion v52.2.0 by @lukekim in #9419
Remove Snapshot Compaction + Add snapshot existence check by @krinart in #9523
Update dependencies by @lukekim in #9566
fix: Update benchmark snapshots by @app/github-actions in #9565
fix: Compare Cayenne table configuration on startup by @peasee in #9529
Make Refresh::refresh_sql more robust to alterations over time. by @Jeadie in #9549
fix: Update datafusion-table-providers dependency to latest revision by @lukekim in #9574
Unset AWS_ENDPOINT_URL when empty by @krinart in #9575
fix: allow BytesProcessedExec repartitioning for unordered input by @lukekim in #9540
Sanitize DataFusion errors by @lukekim in #9530
Add conditional logging for partition assignments by @Jeadie in #9577
use 'properly early exit on SIGTERM' by @Jeadie in #9573
Update datafusion to 52.2.0 by @phillipleblanc in #9582
Ensure we query one and only one partition per request by @Jeadie in #9416
feat: Add support for Spicepod version v2 by @lukekim in #9583
[SpiceDQ] Improve error messages; Avoid race condition on allocate_initial_partitions. by @Jeadie in #9579
Update ballista dependencies to latest 52.0.0 revision by @lukekim in #9581
Fix Databricks spark_connect mode always disabled by @phillipleblanc in #9586
Support partitioning in Arrow accelerator by @Jeadie in #9571
Fix spice query CLI response deserialization by @phillipleblanc in #9588
fix: Update benchmark snapshots by @app/github-actions in #9584
fix: Share RuntimeEnv across Cayenne read/write/delete paths for targeted list_files_cache invalidation by @sgrebnov in #9589
feat: Add file:// state_location support for async queries scheduler by @phillipleblanc in #9590
Update endgame links by @krinart in #9598

Full Changelog: https://github.com/spiceai/spiceai/compare/v1.11.2...v2.0.0-rc.1

Spice v1.11.0 (Jan 28, 2026)

January 28, 2026 · 58 min read

William Croxson

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.11.0-stable! ⚡

In Spice v1.11.0, Spice Cayenne reaches Beta status with acceleration snapshots, Key-based deletion vectors, and Amazon S3 Express One Zone support. DataFusion has been upgraded to v51 along with Arrow v57.2, and iceberg-rust v0.8.0. v1.11 adds several DynamoDB & DynamoDB Streams improvements such as JSON nesting, and adds significant improvements to Distributed Query with active-active schedulers and mTLS for enterprise-grade high-availability and secure cluster communication.

This release also adds new SMB, NFS, and ScyllaDB Data Connectors (Alpha), Prepared Statements with full SDK support (gospice, spice-rs, spice-dotnet, spice-java, spice.js, and spicepy), Google LLM Support for expanded AI inference capabilities, and significant improvements to caching, observability, and Hash Indexing for Arrow Acceleration.

What's New in v1.11.0

Spice Cayenne Accelerator Reaches Beta

Spice Cayenne has been promoted to Beta status with acceleration snapshots support and numerous performance and stability improvements.

Key Enhancements:

Key-based Deletion Vectors: Improved deletion vector support using key-based lookups for more efficient data management and faster delete operations. Key-based deletion vectors are more memory-efficient than positional vectors for sparse deletions.
S3 Express One Zone Support: Store Cayenne data files in S3 Express One Zone for single-digit millisecond latency, ideal for latency-sensitive query workloads that require persistence.

Improved Reliability:

Resolved FuturesUnordered reentrant drop crashes
Fixed memory growth issues related to Vortex metrics allocation
Metadata catalog now properly respects cayenne_file_path location
Added warnings for unparseable configuration values

For more details, refer to the Cayenne Documentation.

DataFusion v51 Upgrade

Apache DataFusion has been upgraded to v51, bringing significant performance improvements, new SQL features, and enhanced observability.

DataFusion v51 ClickBench Performance

Performance Improvements:

Faster CASE Expression Evaluation: Expressions now short-circuit earlier, reuse partial results, and avoid unnecessary scattering, speeding up common ETL patterns
Better Defaults for Remote Parquet Reads: DataFusion now fetches the last 512KB of Parquet files by default, typically avoiding 2 I/O requests per file
Faster Parquet Metadata Parsing: Leverages Arrow 57's new thrift metadata parser for up to 4x faster metadata parsing

New SQL Features:

SQL Pipe Operators: Support for |> syntax for inline transforms
DESCRIBE <query>: Returns the schema of any query without executing it
Named Arguments in SQL Functions: PostgreSQL-style param => value syntax for scalar, aggregate, and window functions
Decimal32/Decimal64 Support: New Arrow types supported including aggregations like SUM, AVG, and MIN/MAX

Example pipe operator:

SELECT * FROM t
|> WHERE a > 10
|> ORDER BY b
|> LIMIT 5;

Improved Observability:

Improved EXPLAIN ANALYZE Metrics: New metrics including output_bytes, selectivity for filters, reduction_factor for aggregates, and detailed timing breakdowns

Arrow 57.2 Upgrade

Apache Arrow has been upgraded to v57.2, bringing major performance improvements and new capabilities.

Key Features:

4x Faster Parquet Metadata Parsing: A rewritten thrift metadata parser delivers up to 4x faster metadata parsing, especially beneficial for low-latency use cases and files with large amounts of metadata
Parquet Variant Support: Experimental support for reading and writing the new Parquet Variant type for semi-structured data, including shredded variant values
Parquet Geometry Support: Read and write support for Parquet Geometry types (GEOMETRY and GEOGRAPHY) with GeospatialStatistics
New arrow-avro Crate: Efficient conversion between Apache Avro and Arrow RecordBatches with projection pushdown and vectorized execution support

DynamoDB Connector Enhancements

Added JSON nesting for DynamoDB Streams
Improved batch deletion handling

Distributed Query Improvements

High Availability Clusters: Spice now supports running multiple active schedulers in an active/active configuration for production deployments. This eliminates the scheduler as a single point of failure and enables graceful handling of node failures.

Multiple schedulers run simultaneously, each capable of accepting queries
Schedulers coordinate via a shared S3-compatible object store
Executors discover all schedulers automatically
A load balancer distributes client queries across schedulers

Example HA configuration:

runtime:
  scheduler:
    state_location: s3://my-bucket/spice-cluster
    params:
      region: us-east-1

mTLS Verification: Cluster communication between scheduler and executors now supports mutual TLS verification for enhanced security.

Credential Propagation: S3, ABFS, and GCS credentials are now automatically propagated to executors in cluster mode, enabling access to cloud storage across the distributed query cluster.

Improved Resilience:

Exponential backoff for scheduler disconnection recovery
Increased gRPC message size limit from 16MB to 100MB for large query plans
HTTP health endpoint for cluster executors
Automatic executor role inference when --scheduler-address is provided

For more details, refer to the Distributed Query Documentation.

iceberg-rust v0.8.0 Upgrade

Spice has been upgraded to iceberg-rust v0.8.0, bringing improved Iceberg table support.

Key Features:

V3 Metadata Support: Full support for Iceberg V3 table metadata format
INSERT INTO Partitioned Tables: DataFusion integration now supports inserting data into partitioned Iceberg tables
Improved Delete File Handling: Better support for position and equality delete files, including shared delete file loading and caching
SQL Catalog Updates: Implement update_table and register_table for SQL catalog
S3 Tables Catalog: Implement update_table for S3 Tables catalog
Enhanced Arrow Integration: Convert Arrow schema to Iceberg schema with auto-assigned field IDs, _file column support, and Date32 type support

Acceleration Snapshots

Acceleration snapshots enable point-in-time recovery and data versioning for accelerated datasets. Snapshots capture the state of accelerated data at specific points, allowing for fast bootstrap recovery and rollback capabilities.

Key Features:

Flexible Triggers: Configure when snapshots are created based on time intervals or stream batch counts
Automatic Compaction: Reduce storage overhead by compacting older snapshots (DuckDB only)
Bootstrap Integration: Snapshots can reset cache expiry on load for seamless recovery (DuckDB with Caching refresh mode)
Smart Creation Policies: Only create snapshots when data has actually changed

Example configuration:

datasets:
  - from: s3://my-bucket/data.parquet
    name: my_dataset
    acceleration:
      enabled: true
      engine: cayenne
      mode: file
      snapshots: enabled
      snapshots_trigger: time_interval
      snapshots_trigger_threshold: 1h
      snapshots_creation_policy: on_changed

Snapshots API and CLI: New API endpoints and CLI commands for managing snapshots programmatically.

CLI Commands:

# List all snapshots for a dataset
spice acceleration snapshots taxi_trips

# Get details of a specific snapshot
spice acceleration snapshot taxi_trips 3

# Set the current snapshot for rollback (requires runtime restart)
spice acceleration set-snapshot taxi_trips 2

HTTP API Endpoints:

Method	Endpoint	Description
GET	`/v1/datasets/{dataset}/acceleration/snapshots`	List all snapshots for a dataset
GET	`/v1/datasets/{dataset}/acceleration/snapshots/{id}`	Get details of a specific snapshot
POST	`/v1/datasets/{dataset}/acceleration/snapshots/current`	Set the current snapshot for rollback

For more details, refer to the Acceleration Snapshots Documentation.

Caching Acceleration Mode Improvements

The Caching Acceleration Mode introduced in v1.10.0 has received significant performance optimizations and reliability fixes in this release.

Performance Optimizations:

Non-blocking Cache Writes: Cache misses no longer block query responses. Data is written to the cache asynchronously after the query returns, reducing query latency for cache miss scenarios.
Batch Cache Writes: Multiple cache entries are now written in batches rather than individually, significantly improving write throughput for high-volume cache operations.

Reliability Fixes:

Correct SWR Refresh Behavior: The stale-while-revalidate (SWR) pattern now correctly refreshes only the specific entries that were accessed instead of refreshing all stale rows in the dataset. This prevents unnecessary source queries and reduces load on upstream data sources.
Deduplicated Refresh Requests: Fixed an issue where JSON array responses could trigger multiple redundant refresh operations. Refresh requests are now properly deduplicated.
Fixed Cache Hit Detection: Resolved an issue where queries that didn't include fetched_at in their projection would always result in cache misses, even when cached data was available.
Unfiltered Query Optimization: SELECT * queries without filters now return cached data directly without unnecessary filtering overhead.

For more details, refer to the Caching Acceleration Mode Documentation.

Prepared Statements

Improved Query Performance and Security: Spice now supports prepared statements, enabling parameterized queries that improve both performance through query plan caching and security by preventing SQL injection attacks.

Key Features:

Query Plan Caching: Prepared statements cache query plans, reducing planning overhead for repeated queries
SQL Injection Prevention: Parameters are safely bound, preventing SQL injection vulnerabilities
Arrow Flight SQL Support: Full prepared statement support via Arrow Flight SQL protocol

SDK Support:

SDK	Support	Min Version	Method
gospice (Go)	✅ Full	v8.0.0+	`SqlWithParams()` with typed constructors (`Int32Param`, `StringParam`, `TimestampParam`, etc.)
spice-rs (Rust)	✅ Full	v3.0.0+	`query_with_params()` with `RecordBatch` parameters
spice-dotnet (.NET)	✅ Full	v0.3.0+	`QueryWithParams()` with typed parameter builders
spice-java (Java)	✅ Full	v0.5.0+	`queryWithParams()` with typed `Param` constructors (`Param.int64()`, `Param.string()`, etc.)
spice.js (JavaScript)	✅ Full	v3.1.0+	`query()` with parameterized query support
spicepy (Python)	✅ Full	v3.1.0+	`query()` with parameterized query support

Example (Go):

import "github.com/spiceai/gospice/v8"

client, _ := spice.NewClient()
defer client.Close()

// Parameterized query with typed parameters
results, _ := client.SqlWithParams(ctx,
    "SELECT * FROM products WHERE price > $1 AND category = $2",
    spice.Float64Param(10.0),
    spice.StringParam("electronics"),
)

Example (Java):

import ai.spice.SpiceClient;
import ai.spice.Param;
import org.apache.arrow.adbc.core.ArrowReader;

try (SpiceClient client = new SpiceClient()) {
    // With automatic type inference
    ArrowReader reader = client.queryWithParams(
        "SELECT * FROM products WHERE price > $1 AND category = $2",
        10.0, "electronics");

    // With explicit typed parameters
    ArrowReader reader = client.queryWithParams(
        "SELECT * FROM products WHERE price > $1 AND category = $2",
        Param.float64(10.0),
        Param.string("electronics"));
}

For more details, refer to the Parameterized Queries Documentation.

Spice Java SDK v0.5.0

Parameterized Query Support for Java: The Spice Java SDK v0.5.0 introduces parameterized queries using ADBC (Arrow Database Connectivity), providing a safer and more efficient way to execute queries with dynamic parameters.

Key Features:

SQL Injection Prevention: Parameters are safely bound, preventing SQL injection vulnerabilities
Automatic Type Inference: Java types are automatically mapped to Arrow types (e.g., double → Float64, String → Utf8)
Explicit Type Control: Use the new Param class with typed factory methods (Param.int64(), Param.string(), Param.decimal128(), etc.) for precise control over Arrow types
Updated Dependencies: Apache Arrow Flight SQL upgraded to 18.3.0, plus new ADBC driver support

Example:

import ai.spice.SpiceClient;
import ai.spice.Param;

try (SpiceClient client = new SpiceClient()) {
    // With automatic type inference
    ArrowReader reader = client.queryWithParams(
        "SELECT * FROM taxi_trips WHERE trip_distance > $1 LIMIT 10",
        5.0);

    // With explicit typed parameters for precise control
    ArrowReader reader = client.queryWithParams(
        "SELECT * FROM orders WHERE order_id = $1 AND amount >= $2",
        Param.int64(12345),
        Param.decimal128(new BigDecimal("99.99"), 10, 2));
}

Maven:

<dependency>
  <groupId>ai.spice</groupId>
  <artifactId>spiceai</artifactId>
  <version>0.5.0</version>
</dependency>

For more details, refer to the Spice Java SDK Repository.

Google LLM Support

Expanded AI Provider Support: Spice now supports Google embedding and chat models via the Google AI provider, expanding the available LLM options for AI inference workloads alongside existing providers like OpenAI, Anthropic, and AWS Bedrock.

Key Features:

Google Chat Models: Access Google's Gemini models for chat completions
Google Embeddings: Generate embeddings using Google's text embedding models
Unified API: Use the same OpenAI-compatible API endpoints for all LLM providers

Example spicepod.yaml configuration:

models:
  - from: google:gemini-2.0-flash
    name: gemini
    params:
      google_api_key: ${secrets:GOOGLE_API_KEY}

embeddings:
  - from: google:text-embedding-004
    name: google_embeddings
    params:
      google_api_key: ${secrets:GOOGLE_API_KEY}

For more details, refer to the Google LLM Documentation (see docs PR #1286).

URL Tables

Query data sources directly via URL in SQL without prior dataset registration. Supports S3, Azure Blob Storage, and HTTP/HTTPS URLs with automatic format detection and partition inference.

Supported Patterns:

Single files: SELECT * FROM 's3://bucket/data.parquet'
Directories/prefixes: SELECT * FROM 's3://bucket/data/'
Glob patterns: SELECT * FROM 's3://bucket/year=*/month=*/data.parquet'

Key Features:

Automatic file format detection (Parquet, CSV, JSON, etc.)
Hive-style partition inference with filter pushdown
Schema inference from files
Works with both SQL and DataFrame APIs

Example with hive partitioning:

-- Partitions are automatically inferred from paths
SELECT * FROM 's3://bucket/data/' WHERE year = '2024' AND month = '01'

Enable via spicepod.yml:

runtime:
  params:
    url_tables: enabled

Cluster Mode Async Query APIs (experimental)

New asynchronous query APIs for long-running queries in cluster mode:

/v1/queries endpoint: Submit queries and retrieve results asynchronously

OpenTelemetry Improvements

Unified Telemetry Endpoint: OTel metrics ingestion has been consolidated to the Flight port (50051), simplifying deployment by removing the separate OTel port (50052). The push-based metrics exporter continues to support integration with OpenTelemetry collectors.

Note: This is a breaking change. Update your configurations if you were using the dedicated OTel port 50052. Internal cluster communication now uses port 50052 exclusively.

Observability Improvements

Enhanced Dashboards: Updated Grafana and Datadog example dashboards with:

Snapshot monitoring widgets
Improved accelerated datasets section
Renamed ingestion lag charts for clarity

Additional Histogram Buckets: Added more buckets to histogram metrics for better latency distribution visibility.

For more details, refer to the Monitoring Documentation.

Hash Indexing for Arrow Acceleration (experimental)

Arrow-based accelerations now support hash indexing for faster point lookups on equality predicates. Hash indexes provide O(1) average-case lookup performance for columns with high cardinality.

Features:

Primary key hash index support
Secondary index support for non-primary key columns
Composite key support with proper null value handling

Example configuration:

datasets:
  - from: postgres:users
    name: users
    acceleration:
      enabled: true
      engine: arrow
      primary_key: user_id
      indexes:
        '(tenant_id, user_id)': unique  # Composite hash index

For more details, refer to the Hash Index Documentation.

SMB and NFS Data Connectors

Network-Attached Storage Connectors: New data connectors for SMB (Server Message Block) and NFS (Network File System) protocols enable direct federated queries against network-attached storage without requiring data movement to cloud object stores.

Key Features:

SMB Protocol Support: Connect to Windows file shares and Samba servers with authentication support
NFS Protocol Support: Connect to Unix/Linux NFS exports for direct data access
Federated Queries: Query Parquet, CSV, JSON, and other file formats directly from network storage with full SQL support
Acceleration Support: Accelerate data from SMB/NFS sources using DuckDB, Spice Cayenne, or other accelerators

Example spicepod.yaml configuration:

datasets:
  # SMB share
  - from: smb://fileserver/share/data.parquet
    name: smb_data
    params:
      smb_username: ${secrets:SMB_USER}
      smb_password: ${secrets:SMB_PASS}

  # NFS export
  - from: nfs://nfsserver/export/data.parquet
    name: nfs_data

For more details, refer to the Data Connectors Documentation.

ScyllaDB Data Connector

A new data connector for ScyllaDB, the high-performance NoSQL database compatible with Apache Cassandra. Query ScyllaDB tables directly or accelerate them for faster analytics.

Example configuration:

datasets:
  - from: scylladb:my_keyspace.my_table
    name: scylla_data
    acceleration:
      enabled: true
      engine: duckdb

For more details, refer to the ScyllaDB Data Connector Documentation.

Flight SQL TLS Connection Fixes

TLS Connection Support: Fixed TLS connection issues when using grpc+tls:// scheme with Flight SQL endpoints. Added support for custom CA certificate files via the new flightsql_tls_ca_certificate_file parameter.

Developer Experience Improvements

Turso v0.3.2 Upgrade: Upgraded Turso accelerator for improved performance and reliability
Rust 1.91 Upgrade: Updated to Rust 1.91 for latest language features and performance improvements
Spice Cloud CLI: Added spice cloud CLI commands for cloud deployment management
Improved Spicepod Schema: Improved JSON schema generation for better IDE support and validation
Acceleration Snapshots: Added configurable snapshots_create_interval for periodic acceleration snapshots independent of refresh cycles
Tiered Caching with Localpod: The Localpod connector now supports caching refresh mode, enabling multi-layer acceleration where a persistent cache feeds a fast in-memory cache
GitHub Data Connector: Added workflows and workflow runs support for GitHub repositories
NDJSON/LDJSON Support: Added support for Newline Delimited JSON and Line Delimited JSON file formats

Additional Improvements & Bug Fixes

Model Listing: New functionality to list available models across multiple AI providers
DuckDB Partitioned Tables: Primary key constraints now supported in partitioned DuckDB table mode
Post-refresh Sorting: New on_refresh_sort_columns parameter for DuckDB enables data ordering after writes
Improved Install Scripts: Removed jq dependency and improved cross-platform compatibility
Better Error Messages: Improved error messaging for bucket UDF arguments and deprecated OpenAI parameters
Reliability: Fixed DynamoDB IAM role authentication with new dynamodb_auth: iam_role parameter
Reliability: Fixed cluster executors to use scheduler's temp_directory parameter for shuffle files
Reliability: Initialize secrets before object stores in cluster executor mode
Reliability: Added page-level retry with backoff for transient GitHub GraphQL errors
Performance: Improved statistics for rewritten DistributeFileScanOptimizer plans
Developer Experience: Added max_message_size configuration for Flight service

Contributors

Breaking Changes

OTel Ingestion Port Change

OTel ingestion has been moved to the Flight port (50051), removing the separate OTel port 50052. Port 50052 is now used exclusively for internal cluster communication. Update your configurations if you were using the dedicated OTel port.

Distributed Query Cluster Mode Requires mTLS

Distributed query cluster mode now requires mTLS for secure communication between cluster nodes. This is a security enhancement to prevent unauthorized nodes from joining the cluster and accessing secrets.

Migration Steps:

Generate certificates using spice cluster tls init and spice cluster tls add
Update scheduler and executor startup commands with --node-mtls-* arguments
For development/testing, use --allow-insecure-connections to opt out of mTLS

Renamed CLI Arguments:

Old Name	New Name
`--cluster-mode`	`--role`
`--cluster-ca-certificate-file`	`--node-mtls-ca-certificate-file`
`--cluster-certificate-file`	`--node-mtls-certificate-file`
`--cluster-key-file`	`--node-mtls-key-file`
`--cluster-address`	`--node-bind-address`
`--cluster-advertise-address`	`--node-advertise-address`
`--cluster-scheduler-url`	`--scheduler-address`

Removed CLI Arguments:

--cluster-api-key: Replaced by mTLS authentication

Cookbook Updates

New ScyllaDB Data Connector Recipe: New recipe demonstrating how to use the ScyllaDB Data Connector. See ScyllaDB Data Connector Recipe for details.

New SMB Data Connector Recipe: New recipe demonstrating how to use the SMB Data Connector. See SMB Data Connector Recipe for details.

The Spice Cookbook includes 86 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.11.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.11.0 image:

docker pull spiceai/spiceai:1.11.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai --version 1.11.0

AWS Marketplace:

Spice is available in the AWS Marketplace.

Dependencies

DataFusion: Upgraded to v51 (release notes)
Arrow: Upgraded to v57.2 (release notes)
iceberg-rust: Upgraded to v0.8.0 (release notes)

What's Changed

Changelog

OTel exporter for push metrics by @lukekim in #8442
fix: Update benchmark snapshots by @app/github-actions in #8448
Add TPCH append tests to scheduled dispatch workflow by @sgrebnov in #8451
Add snapshot creation logging by @krinart in #8469
Fix PeriodicReader panic by @krinart in #8471
Benchmarks: increase readiness timeout for turso acceleration (TPC-H) by @sgrebnov in #8470
fix: Pin CUDA build actions to commits by @peasee in #8477
Add Criterion benchmarking to chunking crate. by @Jeadie in #8431
DuckDB agg pushdown: gate behind accelerator parameter by @mach-kernel in #8474
Rename aggregate_pushdown_optimization -> optimizer_duckdb_aggregate_pushdown by @ewgenius in #8485
Handle throttling exception for DynamoDB streams by @phillipleblanc in #8492
docs: Add release notes by @peasee in #8478
Update spicepod.schema.json by @app/github-actions in #8496
Move 'test_projection_pushdown' to runtime-datafusion by @Jeadie in #8490
Fix OTEL metrics HTTP exporter client setup by @phillipleblanc in #8489
Update endgame to include new caching accelerator cookbook by @phillipleblanc in #8487
DynamoDB tests and fixes by @lukekim in #8491
Align make lint-rust-fix with make lint-rust by @Jeadie in #8499
fix: Remove unused Cayenne parameters by @peasee in #8500
Force task history captured_plan outputs to be captured even if they would be filtered out otherwise by @phillipleblanc in #8501
release: post-release updates by @peasee in #8503
CI: Fix E2E models dispatch by @mach-kernel in #8505
Use an isolated Tokio runtime for refresh tasks that is separate from the main query API by @phillipleblanc in #8504
Update openapi.json by @app/github-actions in #8512
Update dependencies by @phillipleblanc in #8513
fix: Avoid double hashing cache key by @peasee in #8511
fix: Eagerly drop cached records for results larger than max by @peasee in #8516
feat: Support vortex zstd compressor by @peasee in #8515
warning if column is defined in spicepod but is non-existant by @Jeadie in #8498
Return summarized spicepods from /v1/spicepods by @phillipleblanc in #8404
Fix for idle DynamoDB Stream by @krinart in #8506
Use Datafusion::Plan over Datafusion::Internal for user-facing search errors. by @Jeadie in #8484
DDB Streams Integration Test + Memory Acceleration + Improved Warning by @krinart in #8520
Upgrade to gospice v8 by @lukekim in #8524
Vortex file format for object store by @Jeadie in #8525
docs: Add missing cookbooks to endgame, focus area section by @peasee in #8527
ListingTableConnector: Drop partition columns that reoccur in file schema by @mach-kernel in #8519
fix(cluster): initialize secrets before object stores in executor by @sgrebnov in #8532
Use separate Tokio runtime for SWR refreshes by @phillipleblanc in #8530
fix: SQL Results Cache SWR triggers incorrect cache miss metric by @phillipleblanc in #8529
feat: testoperator: OTLP streaming metrics / connect to existing instance / infinite mode / query timeout by @phillipleblanc in #8537
Update openapi.json by @app/github-actions in #8518
Add better attributes for search testing by @Jeadie in #8531
fix: Improve Cayenne errors, ID selection for table/partition creation by @peasee in #8523
Percent-encode Kubernetes secret name path segment by @phillipleblanc in #8522
Initial parameterisation of Search integration tests by @Jeadie in #8066
fix: Add warning when multiple partitions are defined for the same table by @peasee in #8540
Add DuckDB file-mode support for search test parameterization. by @Jeadie in #8541
fix: Add recursion depth limits to prevent DoS via deeply nested data (DynamoDB + S3 Vectors) by @phillipleblanc in #8544
Remove the clippy::too_many_lines lint by @phillipleblanc in #8549
feat: Add spice cluster tls commands by @phillipleblanc in #8550
Move OTel ingestion to Flight port, remove separate OTel port 50052 by @phillipleblanc in #8551
Move verbose tool init messages to trace by @phillipleblanc in #8552
Add SANs to spice cluster tls add certificates by @phillipleblanc in #8554
Add cpu, gpu, and memory to telemetry by @lukekim in #8483
feat: Add workflows and workflow runs to GitHub Data Connector by @peasee in #8548
Add S3Vectors option for paramterised search tests by @Jeadie in #8555
Distributed query: TLS + API key by @mach-kernel in #8468
Refactor RRF SQL to use LogicalPlanBuilder by @Jeadie in #7968
Require mTLS for distributed query cluster mode by @phillipleblanc in #8580
Fix stats for rewritten DistributeFileScanOptimizer plans by @mach-kernel in #8581
Bump actions/cache from 4.3.0 to 5.0.1 by @app/dependabot in #8573
Show user-friendly error on empty DDB table by @krinart in #8586
Add (Deprecated) labels to deprecated spice sql params by @krinart in #8588
Fix kafka warning when security.protocol is set to PLAINTEXT by @krinart in #8587
Upgrade dependencies by @phillipleblanc in #8593
Release notes for v1.10.1 by @Jeadie in #8568
Run search benchmarks twice a week. by @Jeadie in #8592
Add cayenne data accelerator by @Jeadie in #8553
SQL allowlist for tools: sql, list_datasets, search, table_schema by @Jeadie in #8449
use rstest for llms integration tests by @Jeadie in #8566
Post v1.10.1 housekeeping by @Jeadie in #8600
Add checklist for SDK publication in end_game.md by @Jeadie in #8602
Fix IMAGE_TAG assignment for Docker compatibility by @Jeadie in #8599
Modify cluster arguments to spiced for UX review by @phillipleblanc in #8603
Update QA analytics with new release data by @Jeadie in #8601
Remove 'tract-core' dependency. by @Jeadie in #8605
Google embedding and chat models. by @Jeadie in #8423
Fix test_github_workflows integration test by @sgrebnov in #8607
Update openapi.json by @app/github-actions in #8604
ci: Upload artifacts to MinIO eagerly after each build step by @phillipleblanc in #8615
fix: SQLite accelerator decimal/date handling by @phillipleblanc in #8606
Configure mTLS for executor-to-executor gRPC connections by @sgrebnov in #8617
feat: Enable localpod with caching mode accelerator for tiered caching by @phillipleblanc in #8621
Add Cayenne S3 Express One Zone support for data files by @lukekim in #8502
Add snapshot interval for acceleration snapshots by @phillipleblanc in #8627
Add dataset_load_parallelism parameter to spicepod.yml by @peasee in #8630
Json Nesting for DynamoDB by @krinart in #8623
Restore deprecated open-telemetry flag in spiced by @phillipleblanc in #8629
Implement batching for Kafka/Debezium + null Decimal handling by @krinart in #8622
fix: Status field in /v1/datasets & /v1/models by @lukekim in #8633
Add Spice test operator improvements by @sgrebnov in #8625
Add v1.10.2 release notes by @sgrebnov in #8640
Align object-store vortex with runtime feature flagging by @Jeadie in #8620
Extended LLM & search tests on cron by @Jeadie in #8624
Test-operator: emit main metrics as part of load tests by @sgrebnov in #8639
Build a local docker image from an existing Spice binary by @phillipleblanc in #8619
fix: Use runtime-rate-control for GitHub Data Connector by @peasee in #8638
Upgrade dependencies by @phillipleblanc in #8655
Bump headers-accept from 0.1.4 to 0.3.0 by @app/dependabot in #8644
Update openapi.json by @app/github-actions in #8637
Update SECURITY.md - Include v1.10.2 by @sgrebnov in #8661
Serialize acceleration snapshots with refresh writes by @phillipleblanc in #8652
Update AI Installation test to use minilm_l6_v2 by @sgrebnov in #8659
Fixes for search integration test CI by @Jeadie in #8656
fix: Use a GitHub rate controller per auth context by @peasee in #8662
fix: Update Search integration test snapshots by @app/github-actions in #8654
Make E2E Test Release Installation (AI, Local HF model) test more robust by @sgrebnov in #8666
Fix issue with location predicate for custom S3 endpoints + regression integration test by @phillipleblanc in #8668
fix: Validate schema match before projection pushdown in UnionProjectionPushdownOptimizer by @phillipleblanc in #8669
Proper batch commit for kafka/debezium by @krinart in #8671
Improve spicepod json schema generation by @ewgenius in #8547
Start the anonymous telemetry exporter asynchronously by @phillipleblanc in #8679
fix: Move enforce-pulls to hosted runner by @phillipleblanc in #8686
Update QA analytics with 1.10.2 release data by @sgrebnov in #8667
fix: Azure does not support suffix range requests by @phillipleblanc in #8685
Remove spicepod-validator cargo build from build-dev target by @Jeadie in #8684
fix: Update test snapshots by @app/github-actions in #8680
fix: Update Search integration test snapshots by @app/github-actions in #8681
SMB and NFS Data Connectors by @lukekim in #8674
Upgrade to openai-async v0.32 by @lukekim in #8635
v1.10.3 release notes by @phillipleblanc in #8693
Upgrade dependencies by @phillipleblanc in #8704
fix: Support NDJSON and LDJSON by @lukekim in #8649
move OpenAI overrides to non-prefixed by @Jeadie in #8678
Update Google LLM param: google_dimensions -> dimensions by @Jeadie in #8677
Make cluster mTLS optional with insecure flag by @phillipleblanc in #8703
Revert "fix: Move enforce-pulls to hosted runner (#8686)" by @phillipleblanc in #8709
Initial 'testoperator run text-to-sql' by @Jeadie in #8618
Add support for abfss by @krinart in #8706
Add testoperator TPCH dispatch for ABFS with hierarchical namespace disabled + versioning enabled by @phillipleblanc in #8711
Update openapi.json by @app/github-actions in #8692
cluster: validate --role argument by @phillipleblanc in #8717
Upgrade to Turso v0.3.2 by @lukekim in #8716
Rename --insecure to --allow-insecure-connections to be consistent with existing naming by @lukekim in #8720
Remove 'testoperator run http-consistency/http-overhead' by @Jeadie in #8708
refactor: Remove cluster feature flag by @phillipleblanc in #8718
Docs: Distributed query ADR by @mach-kernel in #8608
Use model.datasets to allowlist on tools by @Jeadie in #8714
cluster: quality of life improvements to starting cluster mode locally by @phillipleblanc in #8719
Docs: Ballista extension ADR by @mach-kernel in #8616
Improve deprecation messages when going from prefixed -> non-prefixed. by @Jeadie in #8724
Remove tools from auto-defaults by @Jeadie in #8725
Make distinct providers for vector spilling, vector partitioning. by @Jeadie in #8546
cluster: default scheduler address port by @phillipleblanc in #8728
Add Makefile targets for testoperator by @Jeadie in #8729
text-to-sql dispatch in testoperator by @Jeadie in #8705
DR-006: High Availability Distributed Query with Stateless Schedulers by @lukekim in #8721
DR-007: mTLS for Distributed Query Cluster Communication by @lukekim in #8722
SMB and NFS improvements by @lukekim in #8710
fix: Cluster executors use scheduler's temp_directory for shuffle files by @phillipleblanc in #8733
use 'max_message_size' in flight service too by @Jeadie in #8730
Add page-level retry for transient GraphQL errors with backoff and increase GitHub rate limit buffer up to 100 by @ewgenius in #8726
Make testoperator Dockerfile; CI to build docker image to ghcr.io. by @Jeadie in #8732
cluster: UnionProjectionPushdownOptimizer: Add projection pushdown diagnostics for union children by @phillipleblanc in #8734
Fix column projection order mismatch with location metadata columns by @phillipleblanc in #8738
Fixes for testoperator. by @Jeadie in #8737
Improve Cayenne Deletion Vectors with KeyBased support by @lukekim in #8713
Fix testoperator_dispatch.yaml by @Jeadie in #8740
Add spice cloud CLI commands by @lukekim in #8528
Add FTP, NFS, & SMB TPCH SF1 spicepods by @lukekim in #8739
Prepared Statements by @lukekim in #7588
Schedule dispatch of testoperator run text-to-sql. by @Jeadie in #8745
Fix minio for ai benchmark CI by @Jeadie in #8743
Upgrade to Rust 1.91 by @phillipleblanc in #8749
fix: Update benchmark snapshots by @app/github-actions in #8763
Benchmarks: make row count validation skip logic configurable by scale factor, query set, and overrides by @sgrebnov in #8756
Make benchmark tests more robust by @sgrebnov in #8766
Add parameter to force using iam_role for DynamoDB by @krinart in #8767
fix: Update Search integration test snapshots by @app/github-actions in #8735
v1.10.4 release notes by @phillipleblanc in #8790
Trace metrics export errors by @sgrebnov in #8791
v1.10.4 SECURITY.md update by @phillipleblanc in #8800
Add timezone database to Docker image to fix Cayenne acceleration panic by @sgrebnov in #8799
Upgrade dependencies by @phillipleblanc in #8801
Fix table_allowlist for table sampling and NSQL by @Jeadie in #8789
Cayenne primary key on-conflict handling by @lukekim in #8788
fix: Update benchmark snapshots by @app/github-actions in #8773
fix: correctly identify deprecated openai_* parameters by @phillipleblanc in #8809
fix: Update benchmark snapshots by @app/github-actions in #8812
Use workspace version for cayenne crate by @phillipleblanc in #8811
Don't CAST strings which breaks push down optimizer by @lukekim in #8810
fix: Update benchmark snapshots by @app/github-actions in #8815
Update async-openai to latest revision 4dcd633aad6f - brings fix for openai compatible model providers by @ewgenius in #8816
Add auth/iam_role_source to DynamoDB connector by @krinart in #8808
DynamoDB fixes: JSON nesting for Streams, proper batch deletions by @krinart in #8821
Rough roadmap for 2026-2027 by @lukekim in #8805
Release notes for v1.11.0-rc1 by @ewgenius in #8786
Make S3V integration tests prepare_for_aws_tests more robust by @sgrebnov in #8820
Bump rsa from 0.9.9 to 0.9.10 in the cargo group across 1 directory by @app/dependabot in #8819
Add timezone database to Release and CUDA Docker images to fix Cayene panic by @sgrebnov in #8832
fix: UnionProjectionPushdownOptimizer - Schema change during transform_down breaks parent nodes by @phillipleblanc in #8831
Update grafana/datadog example dashboards by @krinart in #8833
Add Dev bird bench as text-to-sql queryset in CI. by @Jeadie in #8753
Update testoperator scheduler to use release/1.11 branch by @ewgenius in #8829
Spice Cayenne fixes and test spicepods for Beta & RC by @lukekim in #8787
testoperator dispatch all bird-bench database variants by @Jeadie in #8835
feat: Improve column statistics handling with safe access and defaults by @phillipleblanc in #8836
cluster: mTLS verification by @phillipleblanc in #8837
fix: 8770: Unsupported ScalarFunctionExpr in ORDER BY by @lukekim in #8838
Workflow tweaks by @lukekim in #8845
Cayenne: metadata catalog should respect cayenne_file_path location by @sgrebnov in #8844
Expand Cayenne feature coverage by @lukekim in #8848
docs: HA distributed query decisions by @phillipleblanc in #8817
fix(optimizer): Fix correctness issues in UnionProjectionPushdownOptimizer by @phillipleblanc in #8851
Pin reqwest to 0.12.24 to fix HuggingFace embedding model download by @ewgenius in #8853
Fix builds and pin to Ubuntu 22.04 by @lukekim in #8856
Revert "Fix builds and pin to Ubuntu 22.04" by @lukekim in #8861
Ensure setup Rust is run by @lukekim in #8862
fix: Ubuntu 24.04+ renamed libaio1 to libaio1t64 by @lukekim in #8865
Upgrade to Pulls with Spice v2 by @lukekim in #8866
Add limit and configuration name to 'testoperator run text-to-sql' by @Jeadie in #8839
PR check and test optimization by @lukekim in #8868
Upgrade S3 Vectors SDK and improve test robustness by @lukekim in #8867
[Testoperator] Query level and improved aggregate level for NSQL by @Jeadie in #8840
Add docker build for private branches for ghcr.io/spiceai/spiceai-dev by @phillipleblanc in #8873
Expand the data acceleration round-trip test coverage by @lukekim in #8855
fix: Provide a better error for improper bucket UDF arguments by @peasee in #8849
ScyllaDB Data Connector by @lukekim in #8827
Use tokio-rusqlite for Spice Cayenne SQLite by @lukekim in #8857
Cayenne: fix FuturesUnordered reentrant drop crash by @sgrebnov in #8863
Bump github/codeql-action from 4.31.9 to 4.31.10 by @app/dependabot in #8884
Bump golang.org/x/sys from 0.39.0 to 0.40.0 by @app/dependabot in #8881
Bump github.com/spiceai/gospice/v8 from 8.0.0 to 8.0.1 by @app/dependabot in #8883
Bump roaring from 0.11.2 to 0.11.3 by @app/dependabot in #8885
Bump golang.org/x/mod from 0.31.0 to 0.32.0 by @app/dependabot in #8882
Bump aws-sdk-s3 from 1.115.0 to 1.119.0 by @app/dependabot in #8887
Bump libc from 0.2.177 to 0.2.180 by @app/dependabot in #8886
Bump tokio-util from 0.7.17 to 0.7.18 by @app/dependabot in #8889
Bump governor from 0.10.2 to 0.10.4 by @app/dependabot in #8888
fix: flaky test test_concurrent_partition_creation by @phillipleblanc in #8898
Update Cayenne snapshots for TPC-DS by @lukekim in #8890
Add more buckets to histogram metrics by @krinart in #8850
feat: Add HTTP health endpoint for cluster executors by @phillipleblanc in #8899
feat: Implement model listing functionality for multiple providers by @lukekim in #8901
feat: Initial HA schedulers distributed query implementation by @phillipleblanc in #8852
fix: infer executor role from --scheduler-address when --role is omitted by @phillipleblanc in #8903
Improve install scripts and remove jq dependency by @lukekim in #8847
Benchmarks: sort PartitionedUnionExec children for deterministic snapshot comparison by @sgrebnov in #8877
Cayenne: share VortexFileCache across partitions via CayenneContext by @sgrebnov in #8880
Update ballista to add exponential backoff for scheduler disconnection by @phillipleblanc in #8905
Configurably add BirdBench evidence to testoperator text-to-SQL. by @Jeadie in #8904
Helm: Allow command override via values.yaml by @sgrebnov in #8906
Fix distributed query gRPC message size limit (16MB -> 100MB) by @phillipleblanc in #8900
OS specific setup actions by @lukekim in #8909
Cayenne should warn if unable to parse configuration value by @sgrebnov in #8907
Add snapshots widgets to example dashboard by @krinart in #8910
Add quality criteria for the features by @krinart in #8897
Improve Accelerated Datasets section for Grafana/Datadog dashboards by @krinart in #8915
Use HTTP traceparent in NSQL to support concurrency in 'testoperator run text-to-SQL' by @Jeadie in #8912
Remove setup for cc from integration_models.yml by @Jeadie in #8917
Propagate Azure and GCS credentials to executors in cluster mode by @phillipleblanc in #8918
Cayenne: fix memory growth due to vortex metrics allocation by @sgrebnov in #8908
fix(caching): Deduplicate refresh requests for JSON array responses by @sgrebnov in #8921
fix(caching): Return cached data directly for unfiltered queries (SELECT *) by @sgrebnov in #8919
Correct MinIO path syntax for spiced download by @Jeadie in #8916
Acceleration snapshots compaction + Improved Snapshots UX by @krinart in #8858
Change base image from bookworm-slim to trixie-slim by @Jeadie in #8923
Add testoperator run text-to-sql metrics from LogicalPlan by @Jeadie in #8895
Fix spicepod dependencies in testoperator by @Jeadie in #8875
Update copilot instructions for data correctness by @lukekim in #8922
Add BootstrapStatus + Snapshot bootstrapping parallelization by @krinart in #8926
fix: add missing feature-gate for AWS Secrets Manager error variant by @phillipleblanc in #8928
refactor: make ConnectorParams fields public for external connectors by @phillipleblanc in #8929
fix(caching): SWR refreshes only accessed entry instead of all stale rows by @sgrebnov in #8931
Cayenne: include cayenne_metadata_dir to known params by @sgrebnov in #8933
Rename Ingestion Lag chart in example dashboards by @krinart in #8932
fix(caching): Fix HTTP caching always MISS when projection excludes fetched_at by @sgrebnov in #8930
Reset expiry after snapshot bootstraping for Caching by @krinart in #8925
Set use_ssl=false for sccache by @lukekim in #8945
Hash indexing for Arrow Acceleration by @lukekim in #8924
[Cayenne] Acceleration snapshots support by @lukekim in #7973
perf(caching): Non-blocking cache writes on cache miss by @sgrebnov in #8948
Update NSQL models by @lukekim in #8951
Hash Index Key verification by @lukekim in #8949
Add snapshots_creation_policy param by @krinart in #8954
Remove candle & cudarc from non-models build by @lukekim in #8955
Acceleration Snapshots API and CLI by @lukekim in #8934
Ignore test for data_components arrow::indexed::test_primary_key_value_matches_batch by @Jeadie in #8962
fix: Update benchmark snapshots by @app/github-actions in #8965
Hash Index secondary index support by @lukekim in #8958
fix: Support primary key constraints in partitioned DuckDB tables mode by @sgrebnov in #8966
perf(caching): Batch cache writes by @sgrebnov in #8959
CI perf optimizations by @lukekim in #8968
Fix Makefile linting by @Jeadie in #8970
Fixes in testoperator run text-to-sql. by @Jeadie in #8927
implement Chat::as_sql for xAI anthropic by @Jeadie in #8957
Fix duckdb_file_path in search integration test by @Jeadie in #8972
fix: Update benchmark snapshots by @app/github-actions in #8971
Maintenance updates to Anthropic API by @Jeadie in #8956
Add CacheBackend Trait, implement pingora-lru, and add throughput tests by @lukekim in #8080
fix: Update benchmark snapshots by @app/github-actions in #8974
fix: Update benchmark snapshots by @app/github-actions in #8975
Make accelerator shutdown more robust by @lukekim in #8969
feat(duckdb): Add on_refresh_sort_columns for post-write data ordering (initial version) by @sgrebnov in #8964
Proper handling for initial snapshot by @krinart in #8911
fix: Remove --no-default-features from cargo-hack command in features workflow by @phillipleblanc in #8977
build(deps): bump actions/cache from 5.0.1 to 5.0.2 by @app/dependabot in #8983
build(deps): bump actions/checkout from 4 to 6 by @app/dependabot in #8982
build(deps): bump actions/setup-go from 6.1.0 to 6.2.0 by @app/dependabot in #8984
build(deps): bump github.com/olekukonko/tablewriter from 1.1.2 to 1.1.3 by @app/dependabot in #8979
build(deps): bump github.com/klauspost/compress from 1.18.2 to 1.18.3 by @app/dependabot in #8980
Add /v1/queries and Arrow Flight async APIs by @lukekim in #8946
build(deps): bump Vampire/setup-wsl from 5 to 6 by @app/dependabot in #8981
fix: Update Search integration test snapshots by @app/github-actions in #8973
build(deps): bump insta from 1.46.0 to 1.46.1 by @app/dependabot in #8988
build(deps): bump schemars from 1.1.0 to 1.2.0 by @app/dependabot in #8985
fix: Update benchmark snapshots by @app/github-actions in #8978
fix: Data correctness edge cases by @lukekim in #8953
Correct MinIO path syntax for spiced download (Part 2) by @Jeadie in #8995
Make .spice/data in search integration tests by @Jeadie in #8992
fix: Hash index composite keys null values by @lukekim in #9001
Update Cayenne status to Beta by @lukekim in #9002
fix: Disable TPC-DS result validation (not yet supported) by @sgrebnov in #9004
feat: Upgrade to DataFusion v51 and dependencies by @lukekim in #8864
Improvements for snapshots_creation_policy by @krinart in #9003
fix(ci): restore cached spicepod-validator binary instead of lookup-only by @phillipleblanc in #9007
Update version by @krinart in #9010
Update lock file - https://github.com/spiceai/spiceai/commit/53babbf07ca8c1c7b2e1da42ce58c465d9bc9276/
fix: Enable Cayenne acceleration snapshots by @lukekim in #9020
Add TPC-DS integration tests with S3 source and PostgreSQL acceleration by @phillipleblanc in #9006
fix(tests): fix flaky/slow/failing unit tests by @phillipleblanc in #9009
fix: Update benchmark snapshots for DF51 upgrade by @app/github-actions in #9008
fix: add feature gate to rrf TEST_EMBEDDING_MODEL by @phillipleblanc in #9017
fix: features check by @phillipleblanc in #9014
URL table support by @lukekim in #9018
ScyllaDB key filter by @lukekim in #8997
fix: Schema mismatch when using column projection with HTTP caching by @phillipleblanc in #9021
Add more tests for HTTP caching with columns selection by @sgrebnov in #9025
HTTP cache snapshots: default to time_interval and fix snapshots_creation_policy: on_change by @sgrebnov in #9026
Fix duplicate snapshot creation on startup by @sgrebnov in #9029
Remove waiting for runtime to be ready before creating snapshot by @krinart in #9033
Fix snapshot on_change policy to skip when no writes occurred by @sgrebnov in #9028
Release notes for release release/1.11.0-rc.2 by @krinart in #9016
ci: use arduino/setup-protoc for official protobuf compiler by @phillipleblanc in #9036
ci: install unzip on aarch64 runner for arduino/setup-protoc by @phillipleblanc in #9038
fix: don't fail release if upload to minio fails by @phillipleblanc in #9039
Improve validation and logging for hash indexes by @lukekim in #9047
Pin to ubuntu-22.04 by @lukekim in #9068
Fix broken telemetry for testoperator by @krinart in #9054
Fix release builds by @lukekim in #9069
Spice 1.11.0-rc3 release notes by @krinart in #9070
Update spicepod.schema.json by @app/github-actions in #9071
Add missing protoc step to setup-cc action by @krinart in #9041
Fix TLS connection for grpc+tls:// Flight SQL endpoints and add custom CA certificate support by @phillipleblanc in #9073
Update 1.11.0-rc.3 release notes by @krinart in #9082
Fix formula_1 and codebase_community in bird-bench by @Jeadie in #9000
Cayenne S3 Express One Zone improvements by @lukekim in #9015
Add zlib1g-dev to CI by @lukekim in #9052
Upgrade Vortex with CASE-WHEN by @lukekim in #9051
fix: Cayenne CatalogError handling for constraint violations by @lukekim in #9050
Fix Docker build failing to copy shared libraries due to ldd output parsing by @phillipleblanc in #9058
feat: Change /v1/sql and FlightSQL to use local execution in cluster mode by @phillipleblanc in #9055
Remove unmaintained dependencies by @lukekim in #9045
Enable cayenne + changes stream by @Jeadie in #9053
feat(cli): add spice query command for async queries REPL by @phillipleblanc in #9057
Remove unncessary allocations by @lukekim in #9059
Add dataset_acceleration_size_bytes metric by @krinart in #9062
Fix tracing of sql_query beneath tool_use::sample_data. by @Jeadie in #9043
Basic script to run distributed spice by @Jeadie in #9049
Add integration tests for Acceleration Snapshots by @krinart in #9067
Upgrade CUDA toolkit to 12.6.0 by @sgrebnov in #9079
Install required protoc dependency for CUDA build by @sgrebnov in #9080
feat(cluster): add executor control stream heartbeat by @phillipleblanc in #9072
feat: Fix async queries API and integrate Ballista shuffle improvements by @lukekim in #9075
Remove models variant (now default) & Windows builds (use WSL) by @lukekim in #9063
Cayenne: share upload semaphore across partitions to bound memory growth and optimize I/O by @sgrebnov in #9078
Snowflake data connector - add snowflake_private_key parameter by @ewgenius in #9085
Rewrite Go CLI in Rust by @phillipleblanc in #9061
GCS Data Connector (Alpha) by @lukekim in #9084
Skip 'latest' Docker tag for pre-release versions by @sgrebnov in #9077
Add HTTP endpoints for acceleration snapshots API by @phillipleblanc in #9065
Cayenne: Allow append mode with both primary_key and time_column by @sgrebnov in #9090
Add Cluster Observability (Metrics+Dashboard) by @phillipleblanc in #9066
proto for 'CayenneAccelerationExec' by @Jeadie in #9094
Add 'anthropic-beta' header for structured outputs by @Jeadie in #9093
Cayenne: refactor write path to use insert_into() as single entry point (part 1) by @sgrebnov in #9088
Fix testoperator dispatch by @sgrebnov in #9097
Fix setup-spiced GH action (_models suffix does not exist anymore) by @sgrebnov in #9102
build(deps): bump github/codeql-action from 4.31.10 to 4.31.11 by @app/dependabot in #9108
Remove DistributeFileScanOptimizer and UnionProjectionPushdownOptimizer & set target_partitions dynamically based on cluster capacity by @phillipleblanc in #9100
build(deps): bump zip from 2.4.2 to 6.0.0 by @app/dependabot in #9111
fix: Preserve query parameter order in HTTP connector to match filter values by @sgrebnov in #9114
Add PollNow interrupt for Ballista executors to reduce task scheduling latency by @phillipleblanc in #9098
Revert "GCS Data Connector (Alpha) " by @lukekim in #9084
Fix stack overflow for CDC batching by @krinart in #9115
fix: update Ballista fork to include executor timeout fix by @phillipleblanc in #9124
Properly propagate SIGINT/SIGTERM from CLI to runtime by @krinart in #9127
fix: Use the same vortex dependency as ballista by @peasee in #9123
release: Bump version to 1.11.0 for stable - https://github.com/spiceai/spiceai/commit/14d09f8e262008df69ded898ed3bebee08471508/
Cayenne snapshots with shared metadata by @lukekim in #9118
Improve error handling for URL tables with Azure URLs by @phillipleblanc in #9129
Add missing Windows build step for spice CLI in build_and_release workflow by @phillipleblanc in #9143
Fix install-dev to use debug build path for spice binary by @phillipleblanc in #9142
fix: CLI builds by @peasee in #9145
Always create initial snapshots (unless bootstrapped) + when no snapshots exist by @krinart in #9119
fix(cayenne): Fix upsert with pending deletions causing duplicate PKs by @sgrebnov in #9152
fix(flightrepl): Add chrono-tz feature to flightrepl for timezone formatting by @sgrebnov in #9153
fix(delta_lake): Preserve container name in ABFSS URLs for Azure Delta Lake tables by @sgrebnov in #9155
fix: Make CLI system and asset type detection more robust by @peasee in #9148
fix: Set query set properly on benchmarks telemetry metrics attributes by @peasee in #9162
fix: Download _models variant - https://github.com/spiceai/spiceai/commit/27f3058d0007595b02c198755c3b22319032ff30/
fix: Helm chart image tag - https://github.com/spiceai/spiceai/commit/7405c8df0db4ecce0ed6d4a4d424553d604b9036/
Revert "Remove models variant (now default) & Windows builds (use WSL) " by @lukekim in #9063
fix(cli): Several CLI fixes from the Go to Rust migration by @lukekim in #9157

Spice v1.11.0-rc.2 (Jan 22, 2026)

January 22, 2026 · 24 min read

Viktor Yershov

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.11.0-rc.2! ⭐

v1.11.0-rc.2 is the second release candidate for advanced test of v1.11. It brings Spice Cayenne to Beta status with acceleration snapshots support, a new ScyllaDB Data Connector, upgrades to DataFusion v51, Arrow 57.2, and iceberg-rust v0.8.0. It includes significant improvements to distributed query, caching, and observability.

What's New in v1.11.0-rc.2

Spice Cayenne Accelerator Reaches Beta

Spice Cayenne has been promoted to Beta status with acceleration snapshots support and numerous stability improvements.

Improved Reliability:

Fixed timezone database issues in Docker images that caused acceleration panics
Resolved FuturesUnordered reentrant drop crashes
Fixed memory growth issues related to Vortex metrics allocation
Metadata catalog now properly respects cayenne_file_path location
Added warnings for unparseable configuration values

Example configuration with snapshots:

datasets:
  - from: s3://my-bucket/data.parquet
    name: my_dataset
    acceleration:
      enabled: true
      engine: cayenne
      mode: file

DataFusion v51 Upgrade

Apache DataFusion has been upgraded to v51, bringing significant performance improvements, new SQL features, and enhanced observability.

DataFusion v51 ClickBench Performance

Performance Improvements:

Faster CASE Expression Evaluation: Expressions now short-circuit earlier, reuse partial results, and avoid unnecessary scattering, speeding up common ETL patterns
Better Defaults for Remote Parquet Reads: DataFusion now fetches the last 512KB of Parquet files by default, typically avoiding 2 I/O requests per file
Faster Parquet Metadata Parsing: Leverages Arrow 57's new thrift metadata parser for up to 4x faster metadata parsing

New SQL Features:

SQL Pipe Operators: Support for |> syntax for inline transforms
DESCRIBE <query>: Returns the schema of any query without executing it
Named Arguments in SQL Functions: PostgreSQL-style param => value syntax for scalar, aggregate, and window functions
Decimal32/Decimal64 Support: New Arrow types supported including aggregations like SUM, AVG, and MIN/MAX

Example pipe operator:

SELECT * FROM t
|> WHERE a > 10
|> ORDER BY b
|> LIMIT 5;

Improved Observability:

Improved EXPLAIN ANALYZE Metrics: New metrics including output_bytes, selectivity for filters, reduction_factor for aggregates, and detailed timing breakdowns

Arrow 57.2 Upgrade

Spice has been upgraded to Apache Arrow Rust 57.2.0, bringing major performance improvements and new capabilities.

Key Features:

4x Faster Parquet Metadata Parsing: A rewritten thrift metadata parser delivers up to 4x faster metadata parsing, especially beneficial for low-latency use cases and files with large amounts of metadata
Parquet Variant Support: Experimental support for reading and writing the new Parquet Variant type for semi-structured data, including shredded variant values
Parquet Geometry Support: Read and write support for Parquet Geometry types (GEOMETRY and GEOGRAPHY) with GeospatialStatistics
New arrow-avro Crate: Efficient conversion between Apache Avro and Arrow RecordBatches with projection pushdown and vectorized execution support

iceberg-rust v0.8.0 Upgrade

Spice has been upgraded to iceberg-rust v0.8.0, bringing improved Iceberg table support.

Key Features:

V3 Metadata Support: Full support for Iceberg V3 table metadata format
INSERT INTO Partitioned Tables: DataFusion integration now supports inserting data into partitioned Iceberg tables
Improved Delete File Handling: Better support for position and equality delete files, including shared delete file loading and caching
SQL Catalog Updates: Implement update_table and register_table for SQL catalog
S3 Tables Catalog: Implement update_table for S3 Tables catalog
Enhanced Arrow Integration: Convert Arrow schema to Iceberg schema with auto-assigned field IDs, _file column support, and Date32 type support

Acceleration Snapshots

Key Feature Improvements in v1.11:

Flexible Triggers: Configure when snapshots are created based on time intervals or stream batch counts
Automatic Compaction: Reduce storage overhead by compacting older snapshots (DuckDB only)
Bootstrap Integration: Snapshots can reset cache expiry on load for seamless recovery (DuckDB with Caching refresh mode)
Smart Creation Policies: Only create snapshots when data has actually changed

Example configuration:

datasets:
  - from: s3://my-bucket/data.parquet
    name: my_dataset
    acceleration:
      enabled: true
      engine: cayenne
      mode: file
      snapshots: enabled
      snapshots_trigger: time_interval
      snapshots_trigger_threshold: 1h
      snapshots_creation_policy: on_changed

Snapshots API and CLI: New API endpoints and CLI commands for managing snapshots programmatically. List, create, and restore snapshots directly from the command line or via HTTP.

For more details, refer to the Acceleration Snapshots Documentation.

ScyllaDB Data Connector

A new data connector for ScyllaDB, the high-performance NoSQL database compatible with Apache Cassandra. Query ScyllaDB tables directly or accelerate them for faster analytics.

Example configuration:

datasets:
  - from: scylladb:my_keyspace.my_table
    name: scylla_data
    acceleration:
      enabled: true
      engine: duckdb

For more details, refer to the ScyllaDB Data Connector Documentation.

Distributed Query Improvements

mTLS Verification: Cluster communication between scheduler and executors now supports mutual TLS verification for enhanced security.

Credential Propagation: Azure and GCS credentials are now automatically propagated to executors in cluster mode, enabling access to cloud storage across the distributed query cluster.

Improved Resilience:

Exponential backoff for scheduler disconnection recovery
Increased gRPC message size limit from 16MB to 100MB for large query plans
HTTP health endpoint for cluster executors
Automatic executor role inference when --scheduler-address is provided

For more details, refer to the Distributed Query Documentation.

Caching Acceleration Mode Improvements

The Caching Acceleration Mode introduced in v1.10.0 has received significant performance optimizations and reliability fixes in this release.

Performance Optimizations:

Non-blocking Cache Writes: Cache misses no longer block query responses. Data is written to the cache asynchronously after the query returns, reducing query latency for cache miss scenarios.
Batch Cache Writes: Multiple cache entries are now written in batches rather than individually, significantly improving write throughput for high-volume cache operations.

Reliability Fixes:

Correct SWR Refresh Behavior: The stale-while-revalidate (SWR) pattern now correctly refreshes only the specific entries that were accessed instead of refreshing all stale rows in the dataset. This prevents unnecessary source queries and reduces load on upstream data sources.
Deduplicated Refresh Requests: Fixed an issue where JSON array responses could trigger multiple redundant refresh operations. Refresh requests are now properly deduplicated.
Fixed Cache Hit Detection: Resolved an issue where queries that didn't include fetched_at in their projection would always result in cache misses, even when cached data was available.
Unfiltered Query Optimization: SELECT * queries without filters now return cached data directly without unnecessary filtering overhead.

For more details, refer to the Caching Acceleration Mode Documentation.

DynamoDB Connector Enhancements

Added JSON nesting for DynamoDB Streams
Proper batch deletion handling

URL Tables

Query data sources directly via URL in SQL without prior dataset registration. Supports S3, Azure Blob Storage, and HTTP/HTTPS URLs with automatic format detection and partition inference.

Supported Patterns:

Single files: SELECT * FROM 's3://bucket/data.parquet'
Directories/prefixes: SELECT * FROM 's3://bucket/data/'
Glob patterns: SELECT * FROM 's3://bucket/year=*/month=*/data.parquet'

Key Features:

Automatic file format detection (Parquet, CSV, JSON, etc.)
Hive-style partition inference with filter pushdown
Schema inference from files
Works with both SQL and DataFrame APIs

Example with hive partitioning:

-- Partitions are automatically inferred from paths
SELECT * FROM 's3://bucket/data/' WHERE year = '2024' AND month = '01'

Enable via spicepod.yml:

runtime:
  params:
    url_tables: enabled

Cluster Mode Async Query APIs (experimental)

New asynchronous query APIs for long-running queries in cluster mode:

/v1/queries endpoint: Submit queries and retrieve results asynchronously
Arrow Flight async support: Non-blocking query execution via Arrow Flight protocol

Observability Improvements

Enhanced Dashboards: Updated Grafana and Datadog example dashboards with:

Snapshot monitoring widgets
Improved accelerated datasets section
Renamed ingestion lag charts for clarity

Additional Histogram Buckets: Added more buckets to histogram metrics for better latency distribution visibility.

For more details, refer to the Monitoring Documentation.

Additional Improvements

Model Listing: New functionality to list available models across multiple AI providers
DuckDB Partitioned Tables: Primary key constraints now supported in partitioned DuckDB table mode
Post-refresh Sorting: New on_refresh_sort_columns parameter for DuckDB enables data ordering after writes
Improved Install Scripts: Removed jq dependency and improved cross-platform compatibility
Better Error Messages: Improved error messaging for bucket UDF arguments and deprecated OpenAI parameters

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New ScyllaDB Data Connector Recipe: New recipe demonstrating how to use ScyllaDB Data Connector. See ScyllaDB Data Connector Recipe for details.

New SMB Data Connector Recipe: New recipe demonstrating how to use ScyllaDB Data Connector. See SMB Data Connector Recipe for details.

The Spice Cookbook includes 86 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.11.0-rc.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:v1.11.0-rc.2 image:

docker pull spiceai/spiceai:v1.11.0-rc.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

Spice is available in the AWS Marketplace.

Dependencies

DataFusion: Upgraded to v51 (release notes)
Arrow: Upgraded to v57 (release notes)
iceberg-rust: Upgraded to v0.8.0 (release notes)

Changelog

Add timezone database to Docker image to fix Cayenne acceleration panic by @sgrebnov in #8799
Upgrade dependencies by @phillipleblanc in #8801
Fix table_allowlist for table sampling and NSQL by @Jeadie in #8789
Cayenne primary key on-conflict handling by @lukekim in #8788
fix: Update benchmark snapshots by @app/github-actions in #8773
fix: correctly identify deprecated openai_* parameters by @phillipleblanc in #8809
fix: Update benchmark snapshots by @app/github-actions in #8812
Use workspace version for cayenne crate by @phillipleblanc in #8811
Don't CAST strings which breaks push down optimizer by @lukekim in #8810
fix: Update benchmark snapshots by @app/github-actions in #8815
Update async-openai to latest revision 4dcd633aad6f - brings fix for openai compatible model providers by @ewgenius in #8816
Add auth/iam_role_source to DynamoDB connector by @krinart in #8808
DynamoDB fixes: JSON nesting for Streams, proper batch deletions by @krinart in #8821
Rough roadmap for 2026-2027 by @lukekim in #8805
Release notes for v1.11.0-rc1 by @ewgenius in #8786
Make S3V integration tests prepare_for_aws_tests more robust by @sgrebnov in #8820
Bump rsa from 0.9.9 to 0.9.10 in the cargo group across 1 directory by @app/dependabot in #8819
Add timezone database to Release and CUDA Docker images to fix Cayene panic by @sgrebnov in #8832
fix: UnionProjectionPushdownOptimizer - Schema change during transform_down breaks parent nodes by @phillipleblanc in #8831
Update grafana/datadog example dashboards by @krinart in #8833
Add Dev bird bench as text-to-sql queryset in CI. by @Jeadie in #8753
Update testoperator scheduler to use release/1.11 branch by @ewgenius in #8829
Spice Cayenne fixes and test spicepods for Beta & RC by @lukekim in #8787
testoperator dispatch all bird-bench database variants by @Jeadie in #8835
feat: Improve column statistics handling with safe access and defaults by @phillipleblanc in #8836
cluster: mTLS verification by @phillipleblanc in #8837
fix: 8770: Unsupported ScalarFunctionExpr in ORDER BY by @lukekim in #8838
Workflow tweaks by @lukekim in #8845
Cayenne: metadata catalog should respect cayenne_file_path location by @sgrebnov in #8844
Expand Cayenne feature coverage by @lukekim in #8848
docs: HA distributed query decisions by @phillipleblanc in #8817
fix(optimizer): Fix correctness issues in UnionProjectionPushdownOptimizer by @phillipleblanc in #8851
Pin reqwest to 0.12.24 to fix HuggingFace embedding model download by @ewgenius in #8853
Fix builds and pin to Ubuntu 22.04 by @lukekim in #8856
Revert "Fix builds and pin to Ubuntu 22.04" by @lukekim in #8861
Ensure setup Rust is run by @lukekim in #8862
fix: Ubuntu 24.04+ renamed libaio1 to libaio1t64 by @lukekim in #8865
Upgrade to Pulls with Spice v2 by @lukekim in #8866
Add limit and configuration name to 'testoperator run text-to-sql' by @Jeadie in #8839
PR check and test optimization by @lukekim in #8868
Upgrade S3 Vectors SDK and improve test robustness by @lukekim in #8867
[Testoperator] Query level and improved aggregate level for NSQL by @Jeadie in #8840
Add docker build for private branches for ghcr.io/spiceai/spiceai-dev by @phillipleblanc in #8873
Expand the data acceleration round-trip test coverage by @lukekim in #8855
fix: Provide a better error for improper bucket UDF arguments by @peasee in #8849
ScyllaDB Data Connector by @lukekim in #8827
Use tokio-rusqlite for Spice Cayenne SQLite by @lukekim in #8857
Cayenne: fix FuturesUnordered reentrant drop crash by @sgrebnov in #8863
Bump github/codeql-action from 4.31.9 to 4.31.10 by @app/dependabot in #8884
Bump golang.org/x/sys from 0.39.0 to 0.40.0 by @app/dependabot in #8881
Bump github.com/spiceai/gospice/v8 from 8.0.0 to 8.0.1 by @app/dependabot in #8883
Bump roaring from 0.11.2 to 0.11.3 by @app/dependabot in #8885
Bump golang.org/x/mod from 0.31.0 to 0.32.0 by @app/dependabot in #8882
Bump aws-sdk-s3 from 1.115.0 to 1.119.0 by @app/dependabot in #8887
Bump libc from 0.2.177 to 0.2.180 by @app/dependabot in #8886
Bump tokio-util from 0.7.17 to 0.7.18 by @app/dependabot in #8889
Bump governor from 0.10.2 to 0.10.4 by @app/dependabot in #8888
fix: flaky test test_concurrent_partition_creation by @phillipleblanc in #8898
Update Cayenne snapshots for TPC-DS by @lukekim in #8890
Add more buckets to histogram metrics by @krinart in #8850
feat: Add HTTP health endpoint for cluster executors by @phillipleblanc in #8899
feat: Implement model listing functionality for multiple providers by @lukekim in #8901
feat: Initial HA schedulers distributed query implementation by @phillipleblanc in #8852
fix: infer executor role from --scheduler-address when --role is omitted by @phillipleblanc in #8903
Improve install scripts and remove jq dependency by @lukekim in #8847
Benchmarks: sort PartitionedUnionExec children for deterministic snapshot comparison by @sgrebnov in #8877
Cayenne: share VortexFileCache across partitions via CayenneContext by @sgrebnov in #8880
Update ballista to add exponential backoff for scheduler disconnection by @phillipleblanc in #8905
Configurably add BirdBench evidence to testoperator text-to-SQL. by @Jeadie in #8904
Helm: Allow command override via values.yaml by @sgrebnov in #8906
Fix distributed query gRPC message size limit (16MB -> 100MB) by @phillipleblanc in #8900
OS specific setup actions by @lukekim in #8909
Cayenne should warn if unable to parse configuration value by @sgrebnov in #8907
Add snapshots widgets to example dashboard by @krinart in #8910
Add quality criteria for the features by @krinart in #8897
Improve Accelerated Datasets section for Grafana/Datadog dashboards by @krinart in #8915
Use HTTP traceparent in NSQL to support concurrency in 'testoperator run text-to-SQL' by @Jeadie in #8912
Remove setup for cc from integration_models.yml by @Jeadie in #8917
Propagate Azure and GCS credentials to executors in cluster mode by @phillipleblanc in #8918
Cayenne: fix memory growth due to vortex metrics allocation by @sgrebnov in #8908
fix(caching): Deduplicate refresh requests for JSON array responses by @sgrebnov in #8921
fix(caching): Return cached data directly for unfiltered queries (SELECT *) by @sgrebnov in #8919
Correct MinIO path syntax for spiced download by @Jeadie in #8916
Acceleration snapshots compaction + Improved Snapshots UX by @krinart in #8858
Change base image from bookworm-slim to trixie-slim by @Jeadie in #8923
Add testoperator run text-to-sql metrics from LogicalPlan by @Jeadie in #8895
Fix spicepod dependencies in testoperator by @Jeadie in #8875
Update copilot instructions for data correctness by @lukekim in #8922
Add BootstrapStatus + Snapshot bootstrapping parallelization by @krinart in #8926
fix: add missing feature-gate for AWS Secrets Manager error variant by @phillipleblanc in #8928
refactor: make ConnectorParams fields public for external connectors by @phillipleblanc in #8929
fix(caching): SWR refreshes only accessed entry instead of all stale rows by @sgrebnov in #8931
Cayenne: include cayenne_metadata_dir to known params by @sgrebnov in #8933
Rename Ingestion Lag chart in example dashboards by @krinart in #8932
fix(caching): Fix HTTP caching always MISS when projection excludes fetched_at by @sgrebnov in #8930
Reset expiry after snapshot bootstraping for Caching by @krinart in #8925
Set use_ssl=false for sccache by @lukekim in #8945
Hash indexing for Arrow Acceleration by @lukekim in #8924
[Cayenne] Acceleration snapshots support by @lukekim in #7973
perf(caching): Non-blocking cache writes on cache miss by @sgrebnov in #8948
Update NSQL models by @lukekim in #8951
Hash Index Key verification by @lukekim in #8949
Add snapshots_creation_policy param by @krinart in #8954
Remove candle & cudarc from non-models build by @lukekim in #8955
Acceleration Snapshots API and CLI by @lukekim in #8934
Ignore test for data_components arrow::indexed::test_primary_key_value_matches_batch by @Jeadie in #8962
fix: Update benchmark snapshots by @app/github-actions in #8965
Hash Index secondary index support by @lukekim in #8958
fix: Support primary key constraints in partitioned DuckDB tables mode by @sgrebnov in #8966
perf(caching): Batch cache writes by @sgrebnov in #8959
CI perf optimizations by @lukekim in #8968
Fix Makefile linting by @Jeadie in #8970
Fixes in testoperator run text-to-sql. by @Jeadie in #8927
implement Chat::as_sql for xAI anthropic by @Jeadie in #8957
Fix duckdb_file_path in search integration test by @Jeadie in #8972
fix: Update benchmark snapshots by @app/github-actions in #8971
Maintenance updates to Anthropic API by @Jeadie in #8956
Add CacheBackend Trait, implement pingora-lru, and add throughput tests by @lukekim in #8080
fix: Update benchmark snapshots by @app/github-actions in #8974
fix: Update benchmark snapshots by @app/github-actions in #8975
Make accelerator shutdown more robust by @lukekim in #8969
feat(duckdb): Add on_refresh_sort_columns for post-write data ordering (initial version) by @sgrebnov in #8964
Proper handling for initial snapshot by @krinart in #8911
fix: Remove --no-default-features from cargo-hack command in features workflow by @phillipleblanc in #8977
build(deps): bump actions/cache from 5.0.1 to 5.0.2 by @app/dependabot in #8983
build(deps): bump actions/checkout from 4 to 6 by @app/dependabot in #8982
build(deps): bump actions/setup-go from 6.1.0 to 6.2.0 by @app/dependabot in #8984
build(deps): bump github.com/olekukonko/tablewriter from 1.1.2 to 1.1.3 by @app/dependabot in #8979
build(deps): bump github.com/klauspost/compress from 1.18.2 to 1.18.3 by @app/dependabot in #8980
Add /v1/queries and Arrow Flight async APIs by @lukekim in #8946
build(deps): bump Vampire/setup-wsl from 5 to 6 by @app/dependabot in #8981
fix: Update Search integration test snapshots by @app/github-actions in #8973
build(deps): bump insta from 1.46.0 to 1.46.1 by @app/dependabot in #8988
build(deps): bump schemars from 1.1.0 to 1.2.0 by @app/dependabot in #8985
fix: Update benchmark snapshots by @app/github-actions in #8978
fix: Data correctness edge cases by @lukekim in #8953
Correct MinIO path syntax for spiced download (Part 2) by @Jeadie in #8995
Make .spice/data in search integration tests by @Jeadie in #8992
fix: Hash index composite keys null values by @lukekim in #9001
Update Cayenne status to Beta by @lukekim in #9002
fix: Disable TPC-DS result validation (not yet supported) by @sgrebnov in #9004
feat: Upgrade to DataFusion v51 and dependencies by @lukekim in #8864
Improvements for snapshots_creation_policy by @krinart in #9003
fix(ci): restore cached spicepod-validator binary instead of lookup-only by @phillipleblanc in #9007
Update version by @krinart in #9010

Spice v1.9.0 (Nov 19, 2025)

November 19, 2025 · 59 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.9.0-stable! 🌶

v1.9.0-stable introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations, and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0 also upgrades to DataFusion v50, DuckDB v1.4.2, and Delta-Kernel v0.16 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, and delivers many additional features and improvements.

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Introducing Cayenne: SQL as an Acceleration Format: A new high-performance Data Accelerator that simplifies multi-file data acceleration by using an embedded database (SQLite) for metadata while storing data in the Vortex columnar format, a Linux Foundation project. Cayenne delivers query and ingestion performance better than DuckDB's file-based acceleration without DuckDB's memory overhead and the scaling challenges of single DuckDB files.

Cayenne uses SQLite to manage acceleration metadata (schemas, snapshots, statistics, file tracking) through simple SQL transactions, while storing data in Vortex's compressed columnar format. This architecture provides:

Key Features:

SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
  - from: s3:my_table
    name: accelerated_data_30d
    acceleration:
      enabled: true
      engine: cayenne
      mode: file
      refresh_mode: append
      retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Apache Ballista Integration: Spice now supports distributed query execution based on Apache Ballista, enabling distributed queries across multiple executor nodes for improved performance on large datasets. This feature is in preview in v1.9.0.

Architecture:

A distributed Spice cluster consists of:

Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
The feature is in preview and may have stability or performance limitations
Specific acceleration support is planned for future releases

For more details, refer to the Distributed Query Documentation.

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

Apache Spark Compatible Functions: Added support for Spark-compatible functions including array, bit_get/bit_count, bitmap_count, crc32/sha1, date_add/date_sub, if, last_day, like/ilike, luhn_check, mod/pmod, next_day, parse_url, rint, and width_bucket.

Bug Fixes & Reliability: Resolved issues with partition name validation and empty execution plans when vector index lists are empty. Fixed timestamp support for partition expressions, enabling better partitioning for time-series data.

See the Apache DataFusion 50.0.3 Release for more details.

DuckDB v1.4.2 Upgrade and Accelerator Improvements

DuckDB v1.4.2: DuckDB has been upgraded to v1.4.2, which includes several performance optimizations.

Composite ART Index Support: DuckDB in Spice now supports composite (multi-column) Adaptive Radix Tree (ART) indexes for accelerated table scans. When queries filter on multiple columns fully covered by a composite index, the optimizer automatically uses index scans instead of full table scans, delivering significant performance improvements for selective queries.

Example configuration:

datasets:
  - from: file://data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      indexes:
        '(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

DuckDB Intermediate Materialization: Queries with indexes now use intermediate materialization (WITH ... AS MATERIALIZED) to leverage faster index scans. Currently supported for non-federated queries (query_federation: disabled) against a single table with indexes only. When predicates cover more columns than the index, the optimizer rewrites queries to first materialize index-filtered results, then apply remaining predicates. This optimization can deliver significant performance improvements for selective queries.

Example configuration:

datasets:
  - from: file://sales_data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      params:
        query_federation: disabled # Required currently for intermediate materialization
      indexes:
        '(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
  SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/large_table/
    name: partitioned_data
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      retention:
        sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

For more details, refer to the DuckDB Data Accelerator Documentation.

HTTP Data Connector

Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.
Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table
Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      max_retries: 3
      client_timeout: 10s
      allowed_request_paths: /search/people
      request_query_filters: enabled
      request_body_filters: enabled

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      allowed_request_paths: /search/people
      request_query_filters: enabled
      request_body_filters: enabled
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_sql: |
        SELECT request_path, request_query, content 
        FROM tvmaze
        WHERE request_path = '/search/people'
          AND request_query IN ('q=michael', 'q=luke')

For more details, refer to the HTTP Data Connector Documentation.

DynamoDB Data Connector Improvements

Improved Query Performance: The DynamoDB Data Connector now includes improved filter handling for edge cases, parallel scan support for faster data ingestion, and better error handling for misconfigured queries. These improvements enable more reliable and performant access to DynamoDB data.

Example Spicepod.yml configuration:

datasets:
  - from: dynamodb:my_table
    name: ddb_data
    params:
      scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

For more details, refer to the DynamoDB Data Connector Documentation.

S3 Data Connector Improvements

S3 Versioning Support: Spice now supports S3 Versioning for all connectors using object-store (S3, Delta Lake, etc.), ensuring range reads over versioned files are atomically correct. When S3 versioning is enabled, Spice automatically tracks version IDs during file discovery and uses them for all subsequent range reads, preventing inconsistencies from concurrent file modifications.

Current limitations:

Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
Version tracking is automatic when S3 versioning is enabled on the bucket

S3 Single-File Refresh Skipping: Spice now optimizes S3 single-file dataset refreshes by caching file metadata (ETag, Version ID, size, timestamp) and skipping unnecessary data fetches when the underlying file hasn't changed. This optimization dramatically reduces bandwidth usage and improves refresh performance for scenarios where data doesn't change frequently. The feature is enabled by default for accelerated S3 single-file datasets and includes metrics tracking for skipped refreshes.

Example configuration:

datasets:
  - from: s3://my-bucket/data.parquet
    name: s3_data
    acceleration:
      enabled: true
      engine: duckdb
      refresh_check_interval: 10s

When the file's metadata hasn't changed between refresh checks, Spice will skip the data fetch entirely, logging:

Skipping refresh for dataset 's3_data': file metadata unchanged

For more details, refer to the S3 Data Connector Documentation.

Search & Embeddings Enhancements

Full-Text Search on Views: Full-text search indexes are now supported on views, enabling advanced search scenarios over pre-aggregated or transformed data. This extends the power of Spice's search capabilities beyond base datasets.

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
  - name: aggregated_reviews
    sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
    embeddings:
      - column: review_text
        model: openai:text-embedding-3-small

For more details, refer to the Search Documentation and Embeddings Documentation.

Dedicated Query Thread Pool (Now Enabled by Default)

Dedicated Query Thread Pool: Query execution and accelerated refreshes now run on their own dedicated thread pool, separate from the HTTP server. This prevents heavy query workloads from slowing down API responses, keeping health checks fast and avoiding unnecessary Kubernetes pod restarts under load.

This feature was opt-in in previous releases and is now enabled by default. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
  params:
    dedicated_thread_pool: none

For more details, refer to the Runtime Configuration Documentation.

Query Performance Optimizations

Stale-While-Revalidate Cache Control: Query results now support "stale-while-revalidate" cache control, allowing stale cached data to be served immediately while asynchronously refreshing the cache entry in the background. This improves response times for frequently-accessed queries while maintaining data freshness. Requires cache key type to be set to "sql (raw)" for proper operation.

Optimized Prepared Statements: Prepared statement handling has been optimized for better performance with parameterized queries, reducing planning overhead and improving execution time for repeated queries.

Large RecordBatch Chunking: Large Arrow RecordBatch objects are now automatically chunked to control memory usage during query execution, preventing memory exhaustion for queries returning large result sets.

Query Result Caching: Compressed Encoding, Stale-While-Revalidate Cache Control

Zstd Compression Encoding: Query result caching now supports optional Zstandard (zstd) compression encoding to reduce memory usage for cached query results. This is particularly beneficial for large result sets, reducing cache memory footprint while maintaining fast decompression times. Encoding can be configured via the encoding parameter with options none (default) or zstd.

Example configuration:

runtime:
  caching:
    sql_results:
      enabled: true
      max_size: 128MiB
      item_ttl: 1m
      encoding: zstd # Enable zstd compression

HTTP Cache-Control Support: The query result cache now supports the stale-while-revalidate Cache-Control directive, enabling faster response times by serving stale cached results immediately while asynchronously refreshing the cache in the background. This feature is particularly useful for applications that can tolerate slightly stale data in exchange for improved performance.

Example configuration:

runtime:
  caching:
    sql_results:
      enabled: true
      max_size: 128MiB
      item_ttl: 1m
      stale_while_revalidate_ttl: 1m # serve stale items for up to 1 minute after `item_ttl` expires

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

Immediately return the stale cached result to the client
Asynchronously re-execute the query in the background to refresh the cache
Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
Background revalidation re-executes queries through the normal query path
Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

For more details, refer to the Results Caching Documentation.

Security & Reliability Improvements

Enhanced HTTP Client Security: HTTP client usage across the runtime has been hardened with improved TLS validation, certificate pinning for critical endpoints, and better error handling for network failures.

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Improved Credential Retry Logic: AWS SDK credential initialization has been significantly improved with more robust retry logic and better error handling. The system now automatically retries transient credential resolution failures using Fibonacci backoff, allowing Spice to tolerate extended AWS outages (up to ~48 hours) without manual intervention.

Key features:

Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Version-Controlled Data Access: The new Git Data Connector (Alpha) enables querying datasets stored in Git repositories. This connector is ideal for use cases involving configuration files, documentation, or any data tracked in version control.

Example Spicepod.yml configuration:

datasets:
  - from: git:https://github.com/myorg/myrepo
    name: git_metrics
    params:
      file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK has been upgraded with support for configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
    .withArrowMemoryLimitMB(1024) // 1GB limit
    .build();

For more details, refer to the Java SDK Documentation.

CLI Improvements

Install Specific Versions: The spice install command now supports installing specific versions of the Spice runtime and CLI. This enables easy version management, downgrading, or installation of specific releases for testing or compatibility requirements.

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

Persistent Query History: The Spice CLI REPL (SQL, search, and chat interfaces) now persists command history to ~/.spice/query_history.txt, making your query history available across sessions. The history file is automatically created if it doesn't exist, with graceful fallback if the home directory cannot be determined.

New REPL Commands:

.clear - Clear the screen using ANSI escape codes for a clean workspace
.clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear          # Clears the screen
sql> .clear history  # Clears command history
sql> # Use arrow keys or tab to access previous commands

For more details, refer to the CLI Documentation.

Additional Improvements & Bug Fixes

Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
Reliability: Fixed vector dimension determination for partitioned indexes.
Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
Search: Fixed search field handling as metadata for chunked search indexes.
Validation: Added timestamp support for partition expressions.
Validation: Fixed regexp_match function for DuckDB datasets.
Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0 image:

docker pull spiceai/spiceai:1.9.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

DataFusion: Upgraded to v50
Apache Arrow: Upgraded to v56
DuckDB: Upgraded to v1.4.2
Delta Kernel: Upgraded to v0.16.0

Changelog

Fix for search field as metadata for chunked search indexes by @Jeadie in #7429
Bump object_store from 0.12.3 to 0.12.4 by @app/dependabot in #7433
Properly respect disabling snapshots by @phillipleblanc in #7431
Revert "Properly respect disabling snapshots" by @sgrebnov in #7439
Revert "Disable snapshots by default" by @sgrebnov in #7438
Add preview warning for write access mode by @sgrebnov in #7440
fix: regexp_match for DuckDB datasets by @kczimm in #7443
Add feature is currently in preview warning for snapshots by @sgrebnov in #7442
[Logger] Also emit Datafusion logs by @mach-kernel in #7441
add missing snapshot by @kczimm in #7446
Fix tracing so that ai_completions are parented under sql_query by @lukekim in #7415
Enable snapshot acceleration by default by @phillipleblanc in #7451
Disable acceleration refresh metrics by @krinart in #7450
Add v1.8 release notes by @phillipleblanc in #7430
fix: partition name validation by @kczimm in #7452
Fix lint error due to ignore without reasons by @krinart in #7454
Add models and CUDA support to spiced install script by @lukekim in #7457
Post-release 1.8 updates by @phillipleblanc in #7455
Remove println in datafusion by @phillipleblanc in #7461
Update end_game.md to notify once release is done by @sgrebnov in #7460
Remove italics from snapshot logging by @phillipleblanc in #7463
Update openapi.json by @app/github-actions in #7466
Fix generate spicepod schema by @phillipleblanc in #7464
Fix generate acknowledements by @phillipleblanc in #7465
Update spicepod.schema.json by @app/github-actions in #7469
fix: Ensure ListingTable partitions are pruned when filters are not used by @peasee in #7471
Create runtime-secrets crate by @phillipleblanc in #7474
Create runtime-parameters crate by @phillipleblanc in #7475
Don't download the snapshot if the acceleration is present by @phillipleblanc in #7477
Bump hyper-util from 0.1.16 to 0.1.17 by @app/dependabot in #7434
Add missing remote CLI feature by @lukekim in #7478
Add 1.8.0 release analytics by @sgrebnov in #7481
CLI multi-line input support for spice sql by @lukekim in #7479
fix: duckdb partitioning cannot reload config by @kczimm in #7482
fix: Make search cache test use a slower uncached search by @peasee in #7473
Add support for S3 dataset params by @phillipleblanc in #7476
Add DuckDB TPC-H memory limit variations by @lukekim in #7484
Add better snapshot validation for incorrectly configured spicepods by @phillipleblanc in #7487
Move blocking/sync I/O to spawn blocking by @lukekim in #7462
Add DuckDB file accelerator 2G and 4G dispatches by @lukekim in #7491
Validate spicepod file exists before running tests by @lukekim in #7492
Make snapshot reading/writing more robust with Iceberg-like metadata.json by @phillipleblanc in #7486
Re-use build-testoperator and ensure it's cached by @lukekim in #7494
Fix duckdb test operator limit casing by @lukekim in #7498
fix: Update benchmark snapshots by @app/github-actions in #7499
Create runtime-request-context crate by @Jeadie in #7459
Add integration tests for Acceleration DB snapshotting by @phillipleblanc in #7489
Two minor fixes for AI udf tests by @krinart in #7503
Add model response timeout for ai udf tests by @krinart in #7504
Show error if FTS is misconfigured for datasets/views by @krinart in #7458
Add test for chunked search index with search field as metadata by @Jeadie in #7513
Add sccache for build test operator by @lukekim in #7515
Enhancement: Add spill_compression to runtime config by @krinart in #7505
Improve GitHub Data Connector by @lukekim in #7510
Add RequestTimeoutException to S3 client by @Jeadie in #7514
Add sha=<> to snapshot logging by @phillipleblanc in #7521
Add Type to GitHub Data Connector issues and fix double aliasing for project author by @lukekim in #7519
DuckDB acceleration: fix memory leak in duckdb_arrow_scan by @sgrebnov in #7524
Fix partition_by accelerations when a projection is applied on empty partition sets by @phillipleblanc in #7526
Nullable fields for index columns by @Jeadie in #7523
Fix missing winver dependency for Windows by @krinart in #7538
Update mongo config for benchmarks by @krinart in #7546
Add acceleration snapshots cookbook to template by @phillipleblanc in #7527
Bump github/codeql-action from 3 to 4 by @app/dependabot in #7535
Bump golang.org/x/sys from 0.36.0 to 0.37.0 by @app/dependabot in #7529
Make spice chat play nice with Unix pipes by @Jeadie in #7525
Configurable DuckDB duckdb_index_scan_percentage & duckdb_index_scan_max_count by @lukekim in #7551
[cherry-pick] Release notes for release 1.8.1 by @krinart in #7556
Fix 1.8.1 release notes by @krinart in #7558
FTS index with nonfilterable metadata; search field not metadata by default. by @Jeadie in #7548
Properly set auth headers in github_release.py by @krinart in #7560
project_schema when using EmptyExec by @kczimm in #7543
Add 1.8.1 release analytics by @kczimm in #7561
Bump golang.org/x/mod from 0.28.0 to 0.29.0 by @app/dependabot in #7530
Hive-style partitioning for DuckDB file mode by @kczimm in #7563
Vortex Data Accelerator (Dev grade) by @lukekim in #7566
Only load eval scorers when eval defined by @Jeadie in #7549
Bump octocrab from 0.45.0 to 0.47.0 by @app/dependabot in #7531
Bump regex from 1.11.3 to 1.12.1 by @app/dependabot in #7532
Fix custom file path for Vortex Data Accelerator by @phillipleblanc in #7570
Add List type support to Vortex Data Accelerator by @lukekim in #7569
Bump parking_lot from 0.12.4 to 0.12.5 by @app/dependabot in #7534
Bump tokio-postgres from 0.7.14 to 0.7.15 by @app/dependabot in #7533
Remove duplicate line from 1.8.1 release notes by @krinart in #7580
Upgrade Go from v1.24.2 to v1.25.3 by @lukekim in #7582
Fix race condition in S3 Vectors index and bucket creation by @kczimm in #7577
Add runtime-async crate with managed Tokio runtime by @phillipleblanc in #7575
Optimize GitHub Actions workflows by @lukekim in #7584
Use 'location' as primary key for document tables by @Jeadie in #7567
Extend query-related metrics by @krinart in #7571
Enabling acceleration refresh metrics by using runtime.metrics config by @krinart in #7583
S3Vector service metrics in client by @Jeadie in #7502
Fix score order for one test case by @Jeadie in #7595
ObjectMeta filter pushdown for ObjectStoreTextTable by @Jeadie in #7572
Return TableProvider from CandidateGeneration::search. by @Jeadie in #7559
EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider by @Jeadie in #7587
Update official Docker builds to use release binaries by @phillipleblanc in #7597
New Generate Changelog workflow by @krinart in #7562
BytesProcessedExec to allow optimizer to do limit pushdown by @mach-kernel in #7539
GitHub Data Connector add Projects, improve rate-limiting and error handling by @lukekim in #7547
Add copilot-instructions to help improve Copilot reviews by @lukekim in #7606
Add support for DuckDB table-based partitioning by @sgrebnov in #7581
fix: Use nextest for integration models tests by @peasee in #7617
Fix license issue in table-providers by @phillipleblanc in #7620
Remove Build Docker Image from PR checks by @phillipleblanc in #7621
Combine PR Lint + Build checks by @phillipleblanc in #7623
Remove Cache Rust builds step by @phillipleblanc in #7625
Rename duckdb_partition_mode to partition_mode param by @sgrebnov in #7622
DuckDB table partitioning: delete partitions that no longer exist after full refresh by @sgrebnov in #7614
Build integration test binaries in single job by @phillipleblanc in #7624
Make DuckDB table partition data write threshold configurable by @sgrebnov in #7626
Handle table relations in HTTP v1/search by @Jeadie in #7615
Fix E2E Test sporadic failures on the macOS runners by @phillipleblanc in #7627
Emit query_active_count metric by @krinart in #7589
fix: Disable go cache in actions by @peasee in #7631
fix: Don't nullify DuckDB release callbacks for schemas by @peasee in #7628
Fix integration tests by reverting the use of batch inserts w/ prepared statements by @phillipleblanc in #7630
Split integration tests into 3 partitions by @phillipleblanc in #7635
Initial Pepper data accelerator by @lukekim in #7592
Only build the E2E Test CI binaries once by @phillipleblanc in #7633
Update BytesProcessedExec snapshots by @mach-kernel in #7637
Properly set RequestContext for stream execution in Flight by @krinart in #7591
Add task for creating release branch in docs by @kczimm in #7642
Add missing mongodb params by @krinart in #7647
fix: Update benchmark snapshots by @app/github-actions in #7649
Release notes for v1.8.2 by @Jeadie in #7645
docs: Update error handling in copilot instructions by @peasee in #7652
Pepper accelerator INSERT OVERWRITE support by @lukekim in #7643
fix: Update benchmark snapshots by @app/github-actions in #7650
Run Datafusion queries on a separate Tokio runtime by @phillipleblanc in #7586
Add explicit steps for docs DRI in end game by @kczimm in #7658
Add Release 1.8.2 QA Analytics by @krinart in #7661
Pepper full / append refresh support by @lukekim in #7662
Add 'client_timeout' for s3 vector by @Jeadie in #7501
Improvements to Endgame template by @krinart in #7660
Fix OSS docker release trigger when release marked as latest by @phillipleblanc in #7668
Use '#[serde(deny_unknown_fields)]' for base spicepod components by @Jeadie in #7669
DataFusion upgrade template to include Ballista by @mach-kernel in #7679
S3 Vector index spilling by @kczimm in #7613
Refresh request context bindings / fix trunk integration tests by @mach-kernel in #7680
Fix integration tests for refresh append by @lukekim in #7681
Distributed query support by @mach-kernel in #7585
Run acceleration refreshes on separate Tokio runtime by @phillipleblanc in #7671
Support DESCRIBE in clustered mode by @kczimm in #7686
use hyphen instead of period for index name spill separator by @kczimm in #7697
Add streaming option to /nsql endpoint by @kczimm in #7695
Task History min_sql_duration filter support by @lukekim in #7698
Spawn object_store IO tasks on the original Tokio runtime by @phillipleblanc in #7689
Update openapi.json by @app/github-actions in #7700
Adjust DataFusion runtime worker threads by @phillipleblanc in #7704
Gate dedicated SQL engine CPU runtime behind opt-in param dedicated_thread_pool by @phillipleblanc in #7705
Add TPCH S3 refresh spicepod by @phillipleblanc in #7706
DuckDB: include ANALYZE after write to update query optimizer statistics by @sgrebnov in #7714
Display execution time in Spice REPL for no results by @sgrebnov in #7713
Task History capture and store SQL query plans by @lukekim in #7701
Pepper TPC-H SF-1 benchmark by @lukekim in #7717
fix: Update benchmark snapshots by @app/github-actions in #7720
Optimize prepared statements (parameterized queries) by @lukekim in #7703
Pepper accelerator tests (Clickbench, TPC-H SF-5, SF-100) by @lukekim in #7721
Add support for DuckDB connection_pool_size param by @sgrebnov in #7716
Add health probing to testoperator runs by @phillipleblanc in #7709
Simplify AcceleratedTable by @Jeadie in #7724
Add some basic indexing tests for 'FullTextDatabaseIndex' by @Jeadie in #7688
DuckDB: on_refresh_recompute_statistics param + ANALYZE for table-based partitioning by @sgrebnov in #7719
Fix Windows builds by excluding pepper/vortex by @phillipleblanc in #7729
Enable separate CPU runtime thread pool for DataFusion by default by @phillipleblanc in #7732
'runtime-datafusion' crate for runtime related DataFusion components by @Jeadie in #7666
Pepper expanded append refresh support by @lukekim in #7670
Pepper basic partitioning by @lukekim in #7731
Stable pepper benchmark snapshots by @phillipleblanc in #7739
Delta Lake Connector: Support AWS_SESSION_TOKEN parameter by @mach-kernel in #7752
Pepper use SQLite WAL by @lukekim in #7757
v1.8.3 release notes by @mach-kernel in #7745
Increase testoperator health check threshold to 50ms by @phillipleblanc in #7767
Data Accelerator Graceful Shutdown by @lukekim in #7756
Remove Windows CUDA builds by @phillipleblanc in #7768
fix: Update benchmark snapshots by @app/github-actions in #7771
Bump actions/download-artifact from 5 to 6 by @app/dependabot in #7746
Bump serde from 1.0.226 to 1.0.228 by @app/dependabot in #7743
Fix casing for keywords and additional columns by @Jeadie in #7770
Bump actions/upload-artifact from 4 to 5 by @app/dependabot in #7750
Bump criterion from 0.5.1 to 0.7.0 by @app/dependabot in #7740
Bump rustls-native-certs from 0.8.1 to 0.8.2 by @app/dependabot in #7744
Git Data Connector (Alpha) by @lukekim in #7772
Pepper accelerator delete support by @lukekim in #7616
Update Helm chart instructions for Helm in end_game.md by @sgrebnov in #7776
Turso data accelerator by @lukekim in #7472
Apply retention SQL filter to refresh fetch by @phillipleblanc in #7778
Add Parquet buffering option for DuckDB partitioned writes (tables mode) by @sgrebnov in #7735
fix: EmptyExec when list indexes is empty by @kczimm in #7784
1.8.3 post-release housekeeping by @mach-kernel in #7783
feat: Upgrade to Datafusion v50 by @peasee in #7777
fix: Replace vortex datafusion with public crate by @peasee in #7791
Full-text search on views by @Jeadie in #7733
Revert "Apply retention SQL filter to refresh fetch (#7778)" by @phillipleblanc in #7796
fix: Add ingest duration and acceleration size metrics to testoperator by @peasee in #7792
Set local timezone to UTC for DuckDB by @phillipleblanc in #7797
add Timestamp support for partition expressions by @kczimm in #7803
Fix trunk lint by @krinart in #7804
Add missing mongodb params by @krinart in #7807
Embedding columns on view components by @Jeadie in #7795
Add Turso as a Pepper Catalog metastore by @lukekim in #7793
Run retention_sql on refresh commit for DuckDB by @lukekim in #7785
docs: Update datafusion upgrade checklist by @peasee in #7812
Vector engines on views by @Jeadie in #7808
Handle refresh worker panics and add recovery test by @phillipleblanc in #7815
chunk large record batches to control memory usage by @kczimm in #7802
fix: cannot determine vector dimension for partitioned indexes by @kczimm in #7818
Upgrade to Turso v0.3 by @lukekim in #7821
fix: Ensure custom *Exec ExecutionPlans push down dynamic filters by @peasee in #7811
handle casing in RRF by @Jeadie in #7825
Enable 'turso' for pepper acceleration by default by @sgrebnov in #7826
Improved DynamoDB Data Connector by @krinart in #7715
Initial support for llama.cpp as LLM inference backend by @lukekim in #7794
Pepper: Implement retention SQL on refresh commit by @phillipleblanc in #7814
Fix Dockerfiles for arm64 by @lukekim in #7834
[DynamoDB] Handle filter edge-cases by @krinart in #7830
[DynamoDB] Support parallelization for Scan request by @krinart in #7829
Don't feature gate Pepper by @lukekim in #7832
Fix llama.cpp static link by @lukekim in #7835
fix: docker nightly builds by @kczimm in #7837
Use GitHub-hosted macOS runner only for tag releases by @lukekim in #7836
Fix Bug: DuckDB INTERNAL Error: Failed to load metadata pointer by @sgrebnov in #7839
Fix docker arm64 build to use aegis in pure-rust mode by @lukekim in #7840
Revert "Use GitHub-hosted macOS runner only for tag releases" by @lukekim in #7843
Rename Pepper to Cayenne by @lukekim in #7844
Tighten CLI permissions and install script by @lukekim in #7845
Set mvcc for Cayenne Turso metastore by @lukekim in #7850
Optimize Prepared Statements by @lukekim in #7859
Remove unwrap from ODBC connector, fix secrets, and kuberenetes secre… by @lukekim in #7846
Improve and secure HTTP client usage by @lukekim in #7847
Pin Oracle Instant Client download to a SHA by @lukekim in #7851
Improve experience for missing or invalid Spicepod.yaml by @lukekim in #7849
chore: Fix PR linting by @peasee in #7865
Revert FlightIPC issues by @Jeadie in #7870
Bump Jimver/cuda-toolkit from 0.2.28 to 0.2.29 by @app/dependabot in #7878
Optimize macOS and Windows builds by @lukekim in #7863
Improve error message by adding 'cayenne' to the list of valid accelerator engines by @sgrebnov in #7882
fix: Kafka message delivery failed by @kczimm in #7883
fix: allow parameter index without dollar signs by @kczimm in #7887
docs: Update component criteria by @peasee in #7891
Temporary disable supports_limit_pushdown for SchemaCastScanExec by @sgrebnov in #7893
fix: Make integration run with no relevant changes, disable makefile targets on PR by @peasee in #7896
Add Cayenne benchmark and concurrency tests and remove indexes for Turso MVCC by @lukekim in #7879
Remove '.embeddings[].metadata' by @Jeadie in #7897
Revert llama.cpp engine by @lukekim in #7898
Make Cayenne snapshotting more robust by @sgrebnov in #7899
Release notes v1.9.0-rc1 by @Jeadie in #7902
Fix dataset_acceleration_last_refresh_time_ms unit to milliseconds in description by @ewgenius in #7901
Fix lint error in record_explain_plan functionality by @sgrebnov in #7906
Cleanup old snapshots after full refresh by @lukekim in #7908
Cayenne deletion vector caching support by @lukekim in #7903
Split filters into partition filters (for pruning) and data filters by @lukekim in #7889
fix: Update benchmark snapshots by @app/github-actions in #7911
fix: Update benchmark snapshots by @app/github-actions in #7912
fix: Update benchmark snapshots by @app/github-actions in #7913
Update spicepod.schema.json by @app/github-actions in #7916
fix: Update benchmark snapshots by @app/github-actions in #7917
Add Cayenne & Turso accelerators to E2E CI test matrix by @lukekim in #7922
Make preview warnings consistent by @lukekim in #7921
Filter and write optimizations by @lukekim in #7918
fix: Set sccache region explicitly by @peasee in #7928
fix: Enable integration test merge group checks by @peasee in #7927
Update testoperator release branch from 1.8 to 1.9 by @peasee in #7926
Update DuckDB to 1.4.1 with composite ART scans by @mach-kernel in #7884
Don't build Windows on trunk pushes by @lukekim in #7931
fix: Use correct minio secret in build binary push by @peasee in #7934
Update test-framework workers to use dedicated Flight client by @sgrebnov in #7938
Fix financebench, configure s3vectors for appropriate snapshotting by @Jeadie in #7935
Don't try to initialize accelerator if it is disabled by @lukekim in #7932
Add spark UDFs to Spice by @Jeadie in #7936
Fix extra async_trait in cayenne metadata catalog by @phillipleblanc in #7942
deps: Upgrade to Rust 1.90 by @peasee in #7941
Add cayenne accelerator to README.md by @ewgenius in #7905
fix: Update benchmark snapshots by @app/github-actions in #7948
Run integration tests with AWS_EC2_METADATA_DISABLED flag by @sgrebnov in #7952
Only retry credentials on ConnectorError by @kczimm in #7944
fix: Improve join reordering by ensuring JoinSelection is applied by @peasee in #7828
fix: Remove unused deps, consolidate workspace deps by @peasee in #7953
bump async-openai commit by @kczimm in #7929
deps: Use vortex fork by @peasee in #7954
Enable tracing in delta lake integration tests by @sgrebnov in #7951
Update datasets in S3 vectors test case by @Jeadie in #7956
Add spiced metrics scraping to test operator by @lukekim in #7937
Memoize S3 vectors ListIndex API call with configurable TTL by @kczimm in #7910
Cayenne performance optimizations by @lukekim in #7907
Setup HotFix issue template by @ewgenius in #7957
Fix AWS SDK credential cache retry handling by @phillipleblanc in #7943
Infer RRF join_key from TableProvider::constraints and implement SearchQueryProvider::constraints. by @Jeadie in #7959
[Optimizer]: DuckDB intermediate materialization (non-federated) by @mach-kernel in #7964
1.7.3 post-release housekeeping by @ewgenius in #7962
Fix digest_many UDF for ColumnarValue::Array. by @Jeadie in #7960
Fix spiced metrics reporting as part of benchmark tests by @sgrebnov in #7967
Avoid pushing down Spice specific UDFs to accelerators during federation by @Jeadie in #7909
CLI file persisted history with .clear and .clear history commands by @lukekim in #7970
ResultsCache Cache-Control stale-while-revalidate by @lukekim in #7963
Use GetVectors API instead of returnData by @kczimm in #7083
Make DuckDB intermediate materialization logic more robust by @sgrebnov in #7971
[Cayenne] Configurable target Vortex file size by @lukekim in #7972
fix: Update benchmark snapshots by @app/github-actions in #7974
Bump github.com/klauspost/compress from 1.17.11 to 1.18.1 by @app/dependabot in #7872
fix: Update benchmark snapshots by @app/github-actions in #7978
fix: Update benchmark snapshots by @app/github-actions in #7982
Run Integration tests on spiceai-dev-runners by @sgrebnov in #7985
[CLI] Fix cursor issue due to flush by @lukekim in #7981
fix: Support S3 versioning, Vortex dynamic filter pushdown by @peasee in #7984
Make cluster a default feature by @lukekim in #7994
Optimize DuckDB Intermediate Index Materialization for No-Index Case by @sgrebnov in #7998
HTTP connector with dynamic filter support by @lukekim in #7969
Revert federation 'can_execute_plan' by @Jeadie in #7999
Fix stale caching by @lukekim in #7995
Fix count(*) for http connector by @krinart in #8001
[CLI] Install specific version by @lukekim in #8006
Fix stale with revalidate request/response by @lukekim in #8005
Fallback RequestContext for cluster queries by @Jeadie in #8007
Use use_rustls_tls for Spice Cloud /connect by @lukekim in #8008
Use delta-kernel-rs 0.16x + Parquet reader with object meta API changes by @mach-kernel in #8011
fix: Update datafusion & arrow-rs with S3 versioning fix by @lukekim in #8012
Add 1.9.0-rc.2 release notes by @sgrebnov in #7993
Update Datafusion version by @sgrebnov in #8014
[Acceleration] DuckDB tables mode partitioner + CTE rewrite optimizer by @mach-kernel in #8013
Update spicepod.schema.json by @app/github-actions in #8015
Update acknowledgements by @app/github-actions in #8016
Upgrade shutdown signal Ordering by @krinart in #8017
Set max-age: 0 during stale by @lukekim in #8018
Add E2E test release for Helm by @lukekim in #8023
Bump github.com/olekukonko/tablewriter from 0.0.5 to 1.1.1 by @app/dependabot in #7989
Bump schemars from 0.9.0 to 1.0.4 by @app/dependabot in #7877
Update generate_changelog script by @krinart in #8028
Update QA analytics Release 1.9.0-rc.2 by @krinart in #8027
[CLI] Improve auto-complete by @lukekim in #8022
Improve verify helm workflow by @lukekim in #8024
Bump azure_core from 0.28.0 to 0.30.0 by @app/dependabot in #7986
Test operator load test row count validation by @lukekim in #8036
fix: Revert HTTP response offloading by @peasee in #8041
Disable advanced filters pruning for partitioned tables by @sgrebnov in #8037
fix: Ensure Vortex UncompressedSizeInBytes is calculated by @peasee in #8044
Add 1.9.0-rc.3 release notes by @sgrebnov in #8048
fix: Update test snapshots by @app/github-actions in #8046
add benchmark spicepods by @Jeadie in #8047
DynamoDB TPC-H SF1 Benchmarks by @krinart in #8043
Bump github.com/AzureAD/microsoft-authentication-library-for-go from 1.5.0 to 1.6.0 by @app/dependabot in #7988
Bump golang.org/x/sys from 0.37.0 to 0.38.0 by @app/dependabot in #7987
v1.9.0-rc.2 README updates by @lukekim in #8035
Bump suppaftp from 5.4.0 to 6.3.0 by @app/dependabot in #7875
Bump ctor from 0.5.0 to 0.6.0 by @app/dependabot in #7873
WW README Update by @wyattwenzel in #8058
Reenable dynamic federation support by @Jeadie in #8026
fix: Prevent SortExec from ordering below SchemaCastScanExec by @peasee in #8061
Skip logging and return OK() on error during shutdown by @krinart in #8057
Partition pruning with complex expressions by @lukekim in #8040
Update openapi.json by @app/github-actions in #8064
Make DynamoDB snapshots consistent by @krinart in #8069
Add check for error log by @krinart in #8070
Fix tracing of 's3_vector_query_and_get' by @Jeadie in #8065
DuckDB v1.4.2 by @mach-kernel in #8073
Fix failing OpenAI test by @krinart in #8076
Enable 'test_recency_scoring' by @Jeadie in #8068
Test operator: avoid duplicate Flight requests when using --http-clients by @sgrebnov in #8071
Update load tests to use truth percentile values by @sgrebnov in #8079
Update DynamoDB to RC by @krinart in #8060
CachedQueryVector to avoid recomputing embedding vector for spilling/partitioned vector indexes. by @Jeadie in #8059
Fix DuckDB on_commit sink race by @lukekim in #8081
Add partitioned duckdb by @lukekim in #8083
[CLI] Security and santization by @lukekim in #8082
fix: Update benchmark snapshots by @app/github-actions in #8084
Fix partition_by expression by @lukekim in #8087
Data Components security fixes and sanitization by @lukekim in #8086
Runtime security and sanitization by @lukekim in #8088
Add spicepod-validator tool and fix spicepods by @lukekim in #8089
Skip data fetches for S3 single file refreshes by @lukekim in #8072
MCP security and sanitization by @lukekim in #8090
Update spicepod.schema.json by @app/github-actions in #8099
Update acknowledgements by @app/github-actions in #8098
Add install-dev target back to Makefile by @Jeadie in #8100
fix 'testoperator run search' by @Jeadie in #8101
Update datafusion-table-providers - fix nullability inferences for MySQL and PostgreSQL, and fix full text search for PostgreSQL by @ewgenius in #8092
Remove duplicate install-with-models by @phillipleblanc in #8107
Improve Cayenne partitioning by @lukekim in #8097
Testoperator dispatch: respect verify_results dispatch configuration by @sgrebnov in #8106
Include 'match' column only if chunk offsets found in seach query 'LogicalPlan' by @Jeadie in #8102
Fix validation path by @lukekim in #8109
Fix dispatch paths by @lukekim in #8110
Fix dispatch spicepod paths by @lukekim in #8112
fix: Update benchmark snapshots by @app/github-actions in #8113
fix: Update benchmark snapshots by @app/github-actions in #8114
fix: Update benchmark snapshots by @app/github-actions in #8116
Update test Spicepods by @lukekim in #8131
Add validation to reference schema by @lukekim in #8111
Include root error when failing to find latest timestamp in accelerated table by @sgrebnov in #8132
fix: HTTP Connector validation, query and body by @lukekim in #8115
Update nsql model list by @lukekim in #8141
Update DynamoDB Benchmarks by @krinart in #8135
Fix Dremio E2E test by @sgrebnov in #8139
fix: Update MongoDB benchmark snapshots by @app/github-actions in #8143
fix: Update DynamoDB benchmark snapshots by @app/github-actions in #8142
fix: Update benchmark snapshots by @app/github-actions in #8145
fix: Update iceberg[catalog] benchmark snapshots by @app/github-actions in #8144
Improve HTTP Connector UX by @lukekim in #8146
QueryOverrides for DynamoDB benchmarks by @krinart in #8151
test-framework: add row count validation skipping with TPC-DS defaults by @sgrebnov in #8149
fix: Update benchmark snapshots by @app/github-actions in #8148
fix: Update benchmark snapshots by @app/github-actions in #8154
fix: Update benchmark snapshots by @app/github-actions in #8155
fix: Update test snapshots by @app/github-actions in #8160
Suppress delta_kernel::listed_log_files warnings by @phillipleblanc in #8158
Update table providers to fix warning by @phillipleblanc in #8156
Suppress MCP limit log by @phillipleblanc in #8159
Remove incorrect tool name validation by @Jeadie in #8161
Disable results validation for federated/glue[csv].yaml by @phillipleblanc in #8163
fix: Update benchmark snapshots by @app/github-actions in #8164
Fix dynamodb overrides by @phillipleblanc in #8165
Fix dynamo db overrides again by @phillipleblanc in #8166
few more dynamodb overrides by @phillipleblanc in #8167
Add stub release notes for v1.9.0-rc.4 by @phillipleblanc in #8168
Add v1.9.0-rc.4 release notes by @lukekim in #8169
fix: Cayenne concurrent table creation by @lukekim in #8176
fix: Avoid pruning bucket partitions for != and NOT IN operators by @sgrebnov in #8177

Spice v1.9.0-rc.4 (Nov 18, 2025)

November 18, 2025 · 22 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.9.0-rc.4! 🌶

This release candidate brings DuckDB v1.4.2, Cayenne partitioning improvements, and comprehensive security hardening across the CLI, data connectors, runtime, and MCP. v1.9.0-rc.4 also includes MySQL and PostgreSQL connector improvements with fixed nullability inferences and full-text search support, DynamoDB consistency improvements, HTTP connector validation and UX enhancements, and numerous reliability and performance optimizations. Significant improvements were also made to test and automation infrastructure to ensure high quality releases.

v1.9.0 introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations, and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0 also upgrades to DataFusion v50 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, and delivers many additional features and improvements.

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Key Features:

SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
  - from: s3:my_table
    name: accelerated_data_30d
    acceleration:
      enabled: true
      engine: cayenne
      mode: file
      refresh_mode: append
      retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Architecture:

A distributed Spice cluster consists of:

Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
The feature is in preview and may have stability or performance limitations
Specific acceleration support is planned for future releases

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

See the Apache DataFusion 50.0.3 Release for more details.

DuckDB v1.4.2 Upgrade and Accelerator Improvements

DuckDB v1.4.2: DuckDB has been upgraded to v1.4.2, which includes several performance optimizations.

Example configuration:

datasets:
  - from: file://data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      indexes:
        '(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

Example configuration:

datasets:
  - from: file://sales_data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      params:
        query_federation: disabled # Required currently for intermediate materialization
      indexes:
        '(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
  SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/large_table/
    name: partitioned_data
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      retention:
        sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

HTTP Data Connector

Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.
Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table
Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      max_retries: 3
      client_timeout: 10s

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_sql: |
        SELECT request_path, request_query, content 
        FROM tvmaze
        WHERE request_path = '/search/people'
          AND request_query IN ('q=michael', 'q=luke')

DynamoDB Data Connector Improvements

Example Spicepod.yml configuration:

datasets:
  - from: dynamodb:my_table
    name: ddb_data
    params:
      scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

S3 Versioning Support

Atomic Range Reads for Versioned Files: Spice now supports S3 Versioning for all connectors using object-store (S3, Delta Lake, etc.), ensuring range reads over versioned files are atomically correct. When S3 versioning is enabled, Spice automatically tracks version IDs during file discovery and uses them for all subsequent range reads, preventing inconsistencies from concurrent file modifications.

Current limitations:

Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
Version tracking is automatic when S3 versioning is enabled on the bucket

Search & Embeddings Enhancements

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
  - name: aggregated_reviews
    sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
    embeddings:
      - column: review_text
        model: openai:text-embedding-3-small

Dedicated Query Thread Pool (Now Enabled by Default)

This feature was opt-in in previous releases and is now enabled by default. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
  params:
    dedicated_thread_pool: none

Query Performance Optimizations

Query Result Cache: Stale-While-Revalidate

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

Immediately return the stale cached result to the client
Asynchronously re-execute the query in the background to refresh the cache
Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
Background revalidation re-executes queries through the normal query path
Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

Security & Reliability Improvements

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Key features:

Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
Configurable retry limits: Supports up to 300 retry attempts with a maximum retry interval of 600 seconds
Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Example Spicepod.yml configuration:

datasets:
  - from: git:https://github.com/myorg/myrepo
    name: git_metrics
    params:
      file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK have been upgraded with support configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
    .withArrowMemoryLimitMB(1024) // 1GB limit
    .build();

CLI Improvements

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

New REPL Commands:

.clear - Clear the screen using ANSI escape codes for a clean workspace
.clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear          # Clears the screen
sql> .clear history  # Clears command history
sql> # Use arrow keys or tab to access previous commands

Additional Improvements & Bug Fixes

Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
Reliability: Fixed vector dimension determination for partitioned indexes.
Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
Search: Fixed search field handling as metadata for chunked search indexes.
Validation: Added timestamp support for partition expressions.
Validation: Fixed regexp_match function for DuckDB datasets.
Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0-rc.4, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0-rc.4 image:

docker pull spiceai/spiceai:1.9.0-rc.4

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

DataFusion: Upgraded to v50
Apache Arrow: Upgraded to v56
DuckDB: Upgraded to v1.4.2
Delta Kernel: Upgraded to v0.16.0

Changelog (rc.4)

Upgrade shutdown signal Ordering by @krinart in #8017
Set max-age: 0 during stale by @lukekim in #8018
Add E2E test release for Helm by @lukekim in #8023
Update generate_changelog script by @krinart in #8028
[CLI] Improve auto-complete by @lukekim in #8022
Improve verify helm workflow by @lukekim in #8024
fix: Ensure Vortex UncompressedSizeInBytes is calculated by @peasee in #8044
WW README Update by @wyattwenzel in #8058
Reenable dynamic federation support by @Jeadie in #8026
fix: Prevent SortExec from ordering below SchemaCastScanExec by @peasee in #8061
Skip logging and return OK() on error during shutdown by @krinart in #8057
Partition pruning with complex expressions by @lukekim in #8040
Make DynamoDB snapshots consistent by @krinart in #8069
Add check for error log by @krinart in #8070
Fix tracing of 's3_vector_query_and_get' by @Jeadie in #8065
DuckDB v1.4.2 by @mach-kernel in #8073
Fix failing OpenAI test by @krinart in #8076
Enable 'test_recency_scoring' by @Jeadie in #8068
Test operator: avoid duplicate Flight requests when using --http-clients by @sgrebnov in #8071
Update load tests to use truth percentile values by @sgrebnov in #8079
Update DynamoDB to RC by @krinart in #8060
CachedQueryVector to avoid recomputing embedding vector for spilling/partitioned vector indexes. by @Jeadie in #8059
Fix DuckDB on_commit sink race by @lukekim in #8081
Add partitioned duckdb by @lukekim in #8083
[CLI] Security and santization by @lukekim in #8082
Fix partition_by expression by @lukekim in #8087
Data Components security fixes and sanitization by @lukekim in #8086
Runtime security and sanitization by @lukekim in #8088
Add spicepod-validator tool and fix spicepods by @lukekim in #8089
Skip data fetches for S3 single file refreshes by @lukekim in #8072
MCP security and sanitization by @lukekim in #8090
Add install-dev target back to Makefile by @Jeadie in #8100
fix 'testoperator run search' by @Jeadie in #8101
Update datafusion-table-providers - fix nullability inferences for MySQL and PostgreSQL, and fix full text search for PostgreSQL by @ewgenius in #8092
Remove duplicate install-with-models by @phillipleblanc in #8107
Improve Cayenne partitioning by @lukekim in #8097
Testoperator dispatch: respect verify_results dispatch configuration by @sgrebnov in #8106
Include 'match' column only if chunk offsets found in seach query 'LogicalPlan' by @Jeadie in #8102
Fix validation path by @lukekim in #8109
Fix dispatch paths by @lukekim in #8110
Fix dispatch spicepod paths by @lukekim in #8112
Update test Spicepods by @lukekim in #8131
Add validation to reference schema by @lukekim in #8111
Include root error when failing to find latest timestamp in accelerated table by @sgrebnov in #8132
fix: HTTP Connector validation, query and body by @lukekim in #8115
Update nsql model list by @lukekim in #8141
Update DynamoDB Benchmarks by @krinart in #8135
Fix Dremio E2E test by @sgrebnov in #8139
Improve HTTP Connector UX by @lukekim in #8146
QueryOverrides for DynamoDB benchmarks by @krinart in #8151
test-framework: add row count validation skipping with TPC-DS defaults by @sgrebnov in #8149
Suppress delta_kernel::listed_log_files warnings by @phillipleblanc in #8158
Update table providers to fix warning by @phillipleblanc in #8156
Suppress MCP limit log by @phillipleblanc in #8159
Remove incorrect tool name validation by @Jeadie in #8161
Disable results validation for federated/glue[csv].yaml by @phillipleblanc in #8163
Fix dynamodb overrides by @phillipleblanc in #8165
Fix dynamo db overrides again by @phillipleblanc in #8166
few more dynamodb overrides by @phillipleblanc in #8167
Add stub release notes for v1.9.0-rc.4 by @phillipleblanc in #8168

Spice v1.9.0-rc.2 (Nov 11, 2025)

November 11, 2025 · 32 min read

Sergei Grebnov

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.9.0-rc.2! 🌶

This is the second release candidate for v1.9.0, which introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0-rc.2 also upgrades to DataFusion v50 and DuckDB v1.4.1 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, includes significant DynamoDB and DuckDB accelerator improvements, expands the HTTP data connector to support endpoints as tables, and delivers many security and reliability improvements.

What's New in v1.9.0-rc.2

Cayenne Data Accelerator (Beta)

Key Features:

SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
  - from: s3:my_table
    name: accelerated_data_30d
    acceleration:
      enabled: true
      engine: cayenne
      mode: file
      refresh_mode: append
      retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Architecture:

A distributed Spice cluster consists of:

Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
The feature is in preview and may have stability or performance limitations
Specific acceleration support is planned for future releases

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

See the Apache DataFusion 50.0.0 Release for more details.

DuckDB v1.4.1 Upgrade and Accelerator Improvements

DuckDB v1.4.1: DuckDB has been upgraded to v1.4.1, which includes several performance optimizations.

Example configuration:

datasets:
  - from: file://data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      indexes:
        '(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

Example configuration:

datasets:
  - from: file://sales_data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      params:
        query_federation: disabled # Required currently for intermediate materialization
      indexes:
        '(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
  SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/large_table/
    name: partitioned_data
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      retention:
        sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

HTTP Data Connector

Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.
Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table
Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      max_retries: 3
      client_timeout: 10s

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_sql: |
        SELECT request_path, request_query, content 
        FROM tvmaze
        request_path = '/search/people'
          AND request_query IN ('q=michael', 'q=luke')

DynamoDB Data Connector Improvements

Example Spicepod.yml configuration:

datasets:
  - from: dynamodb:my_table
    name: ddb_data
    params:
      scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

S3 Versioning Support

Current limitations:

Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
Version tracking is automatic when S3 versioning is enabled on the bucket

Search & Embeddings Enhancements

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
  - name: aggregated_reviews
    sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
    embeddings:
      - column: review_text
        model: openai:text-embedding-3-small

Dedicated Query Thread Pool (Now Enabled by Default)

This feature was opt-in in previous releases and is now enabled by default in v1.9.0-rc.2. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
  params:
    dedicated_thread_pool: none

Query Performance Optimizations

Query Result Cache: Stale-While-Revalidate

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

Immediately return the stale cached result to the client
Asynchronously re-execute the query in the background to refresh the cache
Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
Background revalidation re-executes queries through the normal query path
Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

Security & Reliability Improvements

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Key features:

Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
Configurable retry limits: Supports up to 300 retry attempts with a maximum retry interval of 600 seconds
Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Example Spicepod.yml configuration:

datasets:
  - from: git:https://github.com/myorg/myrepo
    name: git_metrics
    params:
      file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK have been upgraded with support configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
    .withArrowMemoryLimitMB(1024) // 1GB limit
    .build();

CLI Improvements

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

New REPL Commands:

.clear - Clear the screen using ANSI escape codes for a clean workspace
.clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear          # Clears the screen
sql> .clear history  # Clears command history
sql> # Use arrow keys or tab to access previous commands

Additional Improvements & Bug Fixes

Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
Reliability: Fixed vector dimension determination for partitioned indexes.
Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
Search: Fixed search field handling as metadata for chunked search indexes.
Validation: Added timestamp support for partition expressions.
Validation: Fixed regexp_match function for DuckDB datasets.
Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0-rc.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0-rc.2 image:

docker pull spiceai/spiceai:1.9.0-rc.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

DataFusion: Upgraded to v50
Apache Arrow: Upgraded to v56
DuckDB: Upgraded to v1.4.1
Delta Kernel: Upgraded to v0.16.0

Changelog

Fix for search field as metadata for chunked search indexes by @Jeadie in #7429
Bump object_store from 0.12.3 to 0.12.4 by @app/dependabot in #7433
Properly respect disabling snapshots by @phillipleblanc in #7431
Revert "Properly respect disabling snapshots" by @sgrebnov in #7439
Revert "Disable snapshots by default" by @sgrebnov in #7438
Add preview warning for write access mode by @sgrebnov in #7440
fix: regexp_match for DuckDB datasets by @kczimm in #7443
Add feature is currently in preview warning for snapshots by @sgrebnov in #7442
[Logger] Also emit Datafusion logs by @mach-kernel in #7441
add missing snapshot by @kczimm in #7446
Fix tracing so that ai_completions are parented under sql_query by @lukekim in #7415
Enable snapshot acceleration by default by @phillipleblanc in #7451
Disable acceleration refresh metrics by @krinart in #7450
Add v1.8 release notes by @phillipleblanc in #7430
fix: partition name validation by @kczimm in #7452
Fix lint error due to ignore without reasons by @krinart in #7454
Add models and CUDA support to spiced install script by @lukekim in #7457
Post-release 1.8 updates by @phillipleblanc in #7455
Remove println in datafusion by @phillipleblanc in #7461
Update end_game.md to notify once release is done by @sgrebnov in #7460
Remove italics from snapshot logging by @phillipleblanc in #7463
Update openapi.json by @app/github-actions in #7466
Fix generate spicepod schema by @phillipleblanc in #7464
Fix generate acknowledements by @phillipleblanc in #7465
Update spicepod.schema.json by @app/github-actions in #7469
fix: Ensure ListingTable partitions are pruned when filters are not used by @peasee in #7471
Create runtime-secrets crate by @phillipleblanc in #7474
Create runtime-parameters crate by @phillipleblanc in #7475
Don't download the snapshot if the acceleration is present by @phillipleblanc in #7477
Fix casing for keywords and additional columns by @Jeadie in #7770
Bump actions/upload-artifact from 4 to 5 by @app/dependabot in #7750
Bump criterion from 0.5.1 to 0.7.0 by @app/dependabot in #7740
Bump rustls-native-certs from 0.8.1 to 0.8.2 by @app/dependabot in #7744
Git Data Connector (Alpha) by @lukekim in #7772
Pepper accelerator delete support by @lukekim in #7616
Update Helm chart instructions for Helm in end_game.md by @sgrebnov in #7776
Turso data accelerator by @lukekim in #7472
Apply retention SQL filter to refresh fetch by @phillipleblanc in #7778
Add Parquet buffering option for DuckDB partitioned writes (tables mode) by @sgrebnov in #7735
fix: EmptyExec when list indexes is empty by @kczimm in #7784
1.8.3 post-release housekeeping by @mach-kernel in #7783
feat: Upgrade to Datafusion v50 by @peasee in #7777
fix: Replace vortex datafusion with public crate by @peasee in #7791
Full-text search on views by @Jeadie in #7733
Revert "Apply retention SQL filter to refresh fetch (#7778)" by @phillipleblanc in #7796
fix: Add ingest duration and acceleration size metrics to testoperator by @peasee in #7792
Set local timezone to UTC for DuckDB by @phillipleblanc in #7797
add Timestamp support for partition expressions by @kczimm in #7803
Fix trunk lint by @krinart in #7804
Add missing mongodb params by @krinart in #7807
Embedding columns on view components by @Jeadie in #7795
Add Turso as a Pepper Catalog metastore by @lukekim in #7793
Run retention_sql on refresh commit for DuckDB by @lukekim in #7785
docs: Update datafusion upgrade checklist by @peasee in #7812
Vector engines on views by @Jeadie in #7808
Handle refresh worker panics and add recovery test by @phillipleblanc in #7815
chunk large record batches to control memory usage by @kczimm in #7802
fix: cannot determine vector dimension for partitioned indexes by @kczimm in #7818
Upgrade to Turso v0.3 by @lukekim in #7821
fix: Ensure custom *Exec ExecutionPlans push down dynamic filters by @peasee in #7811
handle casing in RRF by @Jeadie in #7825
Enable 'turso' for pepper acceleration by default by @sgrebnov in #7826
Improved DynamoDB Data Connector by @krinart in #7715
Initial support for llama.cpp as LLM inference backend by @lukekim in #7794
Pepper: Implement retention SQL on refresh commit by @phillipleblanc in #7814
Fix Dockerfiles for arm64 by @lukekim in #7834
[DynamoDB] Handle filter edge-cases by @krinart in #7830
[DynamoDB] Support parallelization for Scan request by @krinart in #7829
Don't feature gate Pepper by @lukekim in #7832
Fix llama.cpp static link by @lukekim in #7835
fix: docker nightly builds by @kczimm in #7837
Use GitHub-hosted macOS runner only for tag releases by @lukekim in #7836
Fix Bug: DuckDB INTERNAL Error: Failed to load metadata pointer by @sgrebnov in #7839
Fix docker arm64 build to use aegis in pure-rust mode by @lukekim in #7840
Revert "Use GitHub-hosted macOS runner only for tag releases" by @lukekim in #7843
Rename Pepper to Cayenne by @lukekim in #7844
Tighten CLI permissions and install script by @lukekim in #7845
Set mvcc for Cayenne Turso metastore by @lukekim in #7850
Optimize Prepared Statements by @lukekim in #7859
Remove unwrap from ODBC connector, fix secrets, and kuberenetes secre… by @lukekim in #7846
Improve and secure HTTP client usage by @lukekim in #7847
Pin Oracle Instant Client download to a SHA by @lukekim in #7851
Improve experience for missing or invalid Spicepod.yaml by @lukekim in #7849
chore: Fix PR linting by @peasee in #7865
Revert FlightIPC issues by @Jeadie in #7870
Improve error message by adding 'cayenne' to the list of valid accelerator engines by @sgrebnov in #7882
fix: allow parameter index without dollar signs by @kczimm in #7887
Temporary disable supports_limit_pushdown for SchemaCastScanExec by @sgrebnov in #7893
Remove '.embeddings[].metadata' by @Jeadie in #7897
Optimize macOS and Windows builds by @lukekim in #7863
fix: Kafka message delivery failed by @kczimm in #7883
docs: Update component criteria by @peasee in #7891
fix: Make integration run with no relevant changes, disable makefile targets on PR by @peasee in #7896
Add Cayenne benchmark and concurrency tests and remove indexes for Turso MVCC by @lukekim in #7879
Revert llama.cpp engine by @lukekim in #7898
Make Cayenne snapshotting more robust by @sgrebnov in #7899
Release notes v1.9.0-rc1 by @Jeadie in #7902
Fix dataset_acceleration_last_refresh_time_ms unit to milliseconds in description by @ewgenius in #7901
Fix lint error in record_explain_plan functionality by @sgrebnov in #7906
Cleanup old snapshots after full refresh by @lukekim in #7908
Cayenne deletion vector caching support by @lukekim in #7903
Split filters into partition filters (for pruning) and data filters by @lukekim in #7889
fix: Update benchmark snapshots by @app/github-actions in #7911
fix: Update benchmark snapshots by @app/github-actions in #7912
fix: Update benchmark snapshots by @app/github-actions in #7913
Update spicepod.schema.json by @app/github-actions in #7916
fix: Update benchmark snapshots by @app/github-actions in #7917
Add Cayenne & Turso accelerators to E2E CI test matrix by @lukekim in #7922
Make preview warnings consistent by @lukekim in #7921
Filter and write optimizations by @lukekim in #7918
fix: Set sccache region explicitly by @peasee in #7928
fix: Enable integration test merge group checks by @peasee in #7927
Update testoperator release branch from 1.8 to 1.9 by @peasee in #7926
Update DuckDB to 1.4.1 with composite ART scans by @mach-kernel in #7884
Don't build Windows on trunk pushes by @lukekim in #7931
fix: Use correct minio secret in build binary push by @peasee in #7934
Update test-framework workers to use dedicated Flight client by @sgrebnov in #7938
Fix financebench, configure s3vectors for appropriate snapshotting by @Jeadie in #7935
Don't try to initialize accelerator if it is disabled by @lukekim in #7932
Add spark UDFs to Spice by @Jeadie in #7936
Fix extra async_trait in cayenne metadata catalog by @phillipleblanc in #7942
deps: Upgrade to Rust 1.90 by @peasee in #7941
Add cayenne accelerator to README.md by @ewgenius in #7905
fix: Update benchmark snapshots by @app/github-actions in #7948
Run integration tests with AWS_EC2_METADATA_DISABLED flag by @sgrebnov in #7952
Only retry credentials on ConnectorError by @kczimm in #7944
fix: Improve join reordering by ensuring JoinSelection is applied by @peasee in #7828
fix: Remove unused deps, consolidate workspace deps by @peasee in #7953
bump async-openai commit by @kczimm in #7929
deps: Use vortex fork by @peasee in #7954
Enable tracing in delta lake integration tests by @sgrebnov in #7951
Update datasets in S3 vectors test case by @Jeadie in #7956
Add spiced metrics scraping to test operator by @lukekim in #7937
Memoize S3 vectors ListIndex API call with configurable TTL by @kczimm in #7910
Cayenne performance optimizations by @lukekim in #7907
Setup HotFix issue template by @ewgenius in #7957
Fix AWS SDK credential cache retry handling by @phillipleblanc in #7943
Infer RRF join_key from TableProvider::constraints and implement SearchQueryProvider::constraints. by @Jeadie in #7959
[Optimizer]: DuckDB intermediate materialization (non-federated) by @mach-kernel in #7964
1.7.3 post-release housekeeping by @ewgenius in #7962
Fix digest_many UDF for ColumnarValue::Array. by @Jeadie in #7960
Fix spiced metrics reporting as part of benchmark tests by @sgrebnov in #7967
Avoid pushing down Spice specific UDFs to accelerators during federation by @Jeadie in #7909
CLI file persisted history with .clear and .clear history commands by @lukekim in #7970
ResultsCache Cache-Control stale-while-revalidate by @lukekim in #7963
Use GetVectors API instead of returnData by @kczimm in #7083
Make DuckDB intermediate materialization logic more robust by @sgrebnov in #7971
[Cayenne] Configurable target Vortex file size by @lukekim in #7972
fix: Update benchmark snapshots by @app/github-actions in #7974
Bump github.com/klauspost/compress from 1.17.11 to 1.18.1 by @app/dependabot in #7872
fix: Update benchmark snapshots by @app/github-actions in #7978
fix: Update benchmark snapshots by @app/github-actions in #7982
Run Integration tests on spiceai-dev-runners by @sgrebnov in #7985
[CLI] Fix cursor issue due to flush by @lukekim in #7981
fix: Support S3 versioning, Vortex dynamic filter pushdown by @peasee in #7984
Make cluster a default feature by @lukekim in #7994
Optimize DuckDB Intermediate Index Materialization for No-Index Case by @sgrebnov in #7998
HTTP connector with dynamic filter support by @lukekim in #7969
Revert federation 'can_execute_plan' by @Jeadie in #7999
Fix stale caching by @lukekim in #7995
Fix count(*) for http connector by @krinart in #8001
[CLI] Install specific version by @lukekim in #8006
Fix stale with revalidate request/response by @lukekim in #8005
Fallback RequestContext for cluster queries by @Jeadie in #8007
Use use_rustls_tls for Spice Cloud /connect by @lukekim in #8008
Use delta-kernel-rs 0.16x + Parquet reader with object meta API changes by @mach-kernel in #8011
fix: Update datafusion & arrow-rs with S3 versioning fix by @lukekim in #8012

Spice v1.9.0-rc.1 (Nov 4, 2025)

November 4, 2025 · 16 min read

William Croxson

Senior Software Engineer at Spice AI

This is the first release candidate for v1.9.0, which introduces Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers DuckDB-comparable performance without scaling limitations. This release also upgrades to DataFusion v50 for improved query performance, expands search capabilities with full-text search on views and multi-column embeddings, includes significant DynamoDB and DuckDB accelerator improvements, and delivers security and reliability enhancements.

What's New in v1.9.0-rc.1

Cayenne Data Accelerator (Alpha)

Introducing Cayenne: SQL as an Acceleration Format: A new high-performance data accelerator that simplifies multi-file data acceleration by using an embedded database (SQLite) for metadata while storing data in the Vortex columnar format. Cayenne delivers query and ingestion performance comparable or better to DuckDB's file-based acceleration without DuckDB's memory overhead and the scaling challenges of single DuckDB files.

Cayenne uses SQLite to manage acceleration metadata (schemas, snapshots, statistics, file tracking) through simple SQL transactions, while storing actual data in Vortex's compressed columnar format. This architecture provides:

Key Features:

SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
  - from: s3:my_table
    name: accelerated_data
    acceleration:
      enabled: true
      engine: cayenne
      retention:
        sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Alpha with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

DataFusion v50 Upgrade

Spice.ai is built on the DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

See the Apache DataFusion 50.0.0 Release for more details.

DynamoDB Data Connector Improvements

Example Spicepod.yml configuration:

datasets:
  - from: dynamodb:my_table
    name: ddb_data
    params:
      scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

Search & Embeddings Enhancements

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
  - name: aggregated_reviews
    sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
    embeddings:
      - column: review_text
        model: openai:text-embedding-3-small

DuckDB Accelerator Improvements

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/large_table/
    name: partitioned_data
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      retention:
        sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

Query Performance Optimizations

Security & Reliability Improvements

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Example Spicepod.yml configuration:

datasets:
  - from: git:https://github.com/myorg/myrepo
    name: git_metrics
    params:
      file_format: csv

For more details, refer to the Git Data Connector Documentation.

Additional Improvements & Bug Fixes

Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
Reliability: Fixed vector dimension determination for partitioned indexes.
Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
Search: Fixed search field handling as metadata for chunked search indexes.
Validation: Added timestamp support for partition expressions.
Validation: Fixed regexp_match function for DuckDB datasets.
Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No major cookbook updates.

The Spice Cookbook includes 81 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0-rc.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0-rc.1 image:

docker pull spiceai/spiceai:1.9.0-rc.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Changelog

Fix for search field as metadata for chunked search indexes by @Jeadie in #7429
Bump object_store from 0.12.3 to 0.12.4 by @app/dependabot in #7433
Properly respect disabling snapshots by @phillipleblanc in #7431
Revert "Properly respect disabling snapshots" by @sgrebnov in #7439
Revert "Disable snapshots by default" by @sgrebnov in #7438
Add preview warning for write access mode by @sgrebnov in #7440
fix: regexp_match for DuckDB datasets by @kczimm in #7443
Add feature is currently in preview warning for snapshots by @sgrebnov in #7442
[Logger] Also emit Datafusion logs by @mach-kernel in #7441
add missing snapshot by @kczimm in #7446
Fix tracing so that ai_completions are parented under sql_query by @lukekim in #7415
Enable snapshot acceleration by default by @phillipleblanc in #7451
Disable acceleration refresh metrics by @krinart in #7450
Add v1.8 release notes by @phillipleblanc in #7430
fix: partition name validation by @kczimm in #7452
Fix lint error due to ignore without reasons by @krinart in #7454
Add models and CUDA support to spiced install script by @lukekim in #7457
Post-release 1.8 updates by @phillipleblanc in #7455
Remove println in datafusion by @phillipleblanc in #7461
Update end_game.md to notify once release is done by @sgrebnov in #7460
Remove italics from snapshot logging by @phillipleblanc in #7463
Update openapi.json by @app/github-actions in #7466
Fix generate spicepod schema by @phillipleblanc in #7464
Fix generate acknowledements by @phillipleblanc in #7465
Update spicepod.schema.json by @app/github-actions in #7469
fix: Ensure ListingTable partitions are pruned when filters are not used by @peasee in #7471
Create runtime-secrets crate by @phillipleblanc in #7474
Create runtime-parameters crate by @phillipleblanc in #7475
Don't download the snapshot if the acceleration is present by @phillipleblanc in #7477
Fix casing for keywords and additional columns by @Jeadie in #7770
Bump actions/upload-artifact from 4 to 5 by @app/dependabot in #7750
Bump criterion from 0.5.1 to 0.7.0 by @app/dependabot in #7740
Bump rustls-native-certs from 0.8.1 to 0.8.2 by @app/dependabot in #7744
Git Data Connector (Alpha) by @lukekim in #7772
Pepper accelerator delete support by @lukekim in #7616
Update Helm chart instructions for Helm in end_game.md by @sgrebnov in #7776
Turso data accelerator by @lukekim in #7472
Apply retention SQL filter to refresh fetch by @phillipleblanc in #7778
Add Parquet buffering option for DuckDB partitioned writes (tables mode) by @sgrebnov in #7735
fix: EmptyExec when list indexes is empty by @kczimm in #7784
1.8.3 post-release housekeeping by @mach-kernel in #7783
feat: Upgrade to Datafusion v50 by @peasee in #7777
fix: Replace vortex datafusion with public crate by @peasee in #7791
Full-text search on views by @Jeadie in #7733
Revert "Apply retention SQL filter to refresh fetch (#7778)" by @phillipleblanc in #7796
fix: Add ingest duration and acceleration size metrics to testoperator by @peasee in #7792
Set local timezone to UTC for DuckDB by @phillipleblanc in #7797
add Timestamp support for partition expressions by @kczimm in #7803
Fix trunk lint by @krinart in #7804
Add missing mongodb params by @krinart in #7807
Embedding columns on view components by @Jeadie in #7795
Add Turso as a Pepper Catalog metastore by @lukekim in #7793
Run retention_sql on refresh commit for DuckDB by @lukekim in #7785
docs: Update datafusion upgrade checklist by @peasee in #7812
Vector engines on views by @Jeadie in #7808
Handle refresh worker panics and add recovery test by @phillipleblanc in #7815
chunk large record batches to control memory usage by @kczimm in #7802
fix: cannot determine vector dimension for partitioned indexes by @kczimm in #7818
Upgrade to Turso v0.3 by @lukekim in #7821
fix: Ensure custom *Exec ExecutionPlans push down dynamic filters by @peasee in #7811
handle casing in RRF by @Jeadie in #7825
Enable 'turso' for pepper acceleration by default by @sgrebnov in #7826
Improved DynamoDB Data Connector by @krinart in #7715
Initial support for llama.cpp as LLM inference backend by @lukekim in #7794
Pepper: Implement retention SQL on refresh commit by @phillipleblanc in #7814
Fix Dockerfiles for arm64 by @lukekim in #7834
[DynamoDB] Handle filter edge-cases by @krinart in #7830
[DynamoDB] Support parallelization for Scan request by @krinart in #7829
Don't feature gate Pepper by @lukekim in #7832
Fix llama.cpp static link by @lukekim in #7835
fix: docker nightly builds by @kczimm in #7837
Use GitHub-hosted macOS runner only for tag releases by @lukekim in #7836
Fix Bug: DuckDB INTERNAL Error: Failed to load metadata pointer by @sgrebnov in #7839
Fix docker arm64 build to use aegis in pure-rust mode by @lukekim in #7840
Revert "Use GitHub-hosted macOS runner only for tag releases" by @lukekim in #7843
Rename Pepper to Cayenne by @lukekim in #7844
Tighten CLI permissions and install script by @lukekim in #7845
Set mvcc for Cayenne Turso metastore by @lukekim in #7850
Optimize Prepared Statements by @lukekim in #7859
Remove unwrap from ODBC connector, fix secrets, and kuberenetes secre… by @lukekim in #7846
Improve and secure HTTP client usage by @lukekim in #7847
Pin Oracle Instant Client download to a SHA by @lukekim in #7851
Improve experience for missing or invalid Spicepod.yaml by @lukekim in #7849
chore: Fix PR linting by @peasee in #7865
Revert FlightIPC issues by @Jeadie in #7870
Improve error message by adding 'cayenne' to the list of valid accelerator engines by @sgrebnov in #7882
fix: allow parameter index without dollar signs by @kczimm in #7887
Temporary disable supports_limit_pushdown for SchemaCastScanExec by @sgrebnov in #7893
Remove '.embeddings[].metadata' by @Jeadie in #7897

Spice v1.7.0 (Sep 23, 2025)

September 23, 2025 · 21 min read

Sergei Grebnov

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.7.0! ⚡

Spice v1.7.0 upgrades to DataFusion v49 for improved performance and query optimization, introduces real-time full-text search indexing for CDC streams, EmbeddingGemma support for high-quality embeddings, new search table functions powering the /v1/search API, embedding request caching for faster and cost-efficient search and indexing, and OpenAI Responses API tool calls with streaming. This release also includes numerous bug fixes across CDC streams, vector search, the Kafka Data Connector, and error reporting.

What's New in v1.7.0

DataFusion v49 Highlights

DataFusion Clickbench Performance Graph Source: DataFusion 49.0.0 Release Blog.

Performance Improvements 🚀

Equivalence System Upgrade: Faster planning for queries with many columns, enabling more sophisticated sort-based optimizations.
Dynamic Filters & TopK Pushdown: Queries with ORDER BY and LIMIT now use dynamic filters and physical filter pushdown, skipping unnecessary data reads for much faster top-k queries.
Compressed Spill Files: Intermediate files written during sort/group spill to disk are now compressed, reducing disk usage and improving performance.
WITHIN GROUP for Ordered-Set Aggregates: Support for ordered-set aggregate functions (e.g., percentile_disc) with WITHIN GROUP.
REGEXP_INSTR Function: Find regex match positions in strings.

Spice Runtime Highlights

EmbeddingGemma Support: Spice now supports EmbeddingGemma, Google's state-of-the-art embedding model for text and documents. EmbeddingGemma provides high-quality, efficient embeddings for semantic search, retrieval, and recommendation tasks. You can use EmbeddingGemma via HuggingFace in your Spicepod configuration:

Example spicepod.yml snippet:

embeddings:
  - from: huggingface:huggingface.co/google/embeddinggemma-300m
    name: embeddinggemma
    params:
      hf_token: ${secrets:HUGGINGFACE_TOKEN}

Learn more about EmbeddingGemma in the official documentation.

POST /v1/search API Use Search Table Functions: The /v1/search API now uses the new text_search and vector_search Table Functions for improved performance.

Embedding Request Caching: The runtime now supports caching embedding requests, reducing latency and cost for repeated content and search requests.

Example spicepod.yml snippet:

runtime:
  caching:
    embeddings:
      enabled: true
      max_size: 128mb
      item_ttl: 5s

See the Caching documentation for details.

Real-Time Indexing for Full Text Search: Full Text search indexing is now supported for connectors that enable real-time changes, such as Debezium CDC streams. Adding a full-text index on a column with refresh_mode: changes works as it does for full/append-mode refreshes, enabling instant search on new data.

Example spicepod.yml snippet:

datasets:
  - from: debezium:cdc.public.question
    name: questions
    acceleration:
      enabled: true
      engine: duckdb
      primary_key: id
      refresh_mode: changes # Use 'changes'
    params: *kafka_params
    columns:
      - name: title
        full_text_search:
          enabled: true # Enable full-text-search indexing
          row_id:
            - id

OpenAI Responses API Tool Calls with Streaming: The OpenAI Responses API now supports tool calls with streaming, enabling advanced model interactions such as web_search and code_interpreter with real-time response streaming. This allows you to invoke OpenAI-hosted tools and receive results as they are generated.

Learn more in the OpenAI Model Provider documentation.

Runtime Output Level Configuration: You can now set the output_level parameter in the Spicepod runtime configuration to control logging verbosity in addition to the existing CLI and environment variable support. Supported values are info, verbose, and very_verbose. The value is applied in the following priority: CLI, environment variables, then YAML configuration.

Example spicepod.yml snippet:

runtime:
  output_level: info # or verbose, very_verbose

For more details on configuring output level, see the Troubleshooting documentation.

Bug Fixes

Several bugs and issues have been resolved in this release, including:

CDC Streams: Fixed issues where refresh_mode: changes could prevent the Spice runtime from becoming Ready, and improved support for full-text indexing on CDC streams.
Vector Search: Fixed bugs where vector search HTTP pipeline could not find more than one IndexedTableProvider, and resolved errors with field mismatches in vector_search UDTF.
Kafka Integration: Improved Kafka schema inference with configurable sample size, improved consumer group persistence for SQLite and Postgres accelerations, and added cooperative mode support.
Perplexity Web Search: Fixed bug where Perplexity web search sometimes used incorrect query schema (limit).
Databricks: Fixed issue with unparsing embedded columns.
Error Reporting: ThrottlingException is now reported correctly instead of as InternalError.
Iceberg Data Connector: Added support for LIMIT pushdown.
Amazon S3 Vectors: Fixed ingestion issues with zero-vectors and improved handling when vector index is full.
Tracing: Fixed vector search tracing to correctly report SQL status.

Contributors

New Contributors

@ChrisTomAlxHitachi made their first contribution in github.com/spiceai/spiceai/pull/6932 🎉

Breaking Changes

No breaking changes.

Cookbook Updates

New Spice with Dotnet SDK Recipe - The recipe shows how to query Spice using the Dotnet SDK.

The Spice Cookbook includes 78 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.7.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.7.0 image:

docker pull spiceai/spiceai:1.7.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

Rust: Upgraded from 1.88.0 to 1.89.0
DataFusion: Upgraded from 48.0.1 to 49.0.0
text-embeddings-inference: Upgraded from 1.7.3 to 1.8.2
twox-hash: Upgraded from 1.6.3 to 2.1.0.

Changelog

Fix parameterised query planning in DataFusion by @Jeadie in #6942
fix: Update benchmark snapshots by @app/github-actions in #6944
refactor: Decouple full text search candidate from UDTF by @peasee in #6940
fix: Re-enable search integration tests by @peasee in #6930
Update acknowledgements and spicepod.schema.json by @sgrebnov in #6948
Add enabling the responses API by @lukekim in #6949
Post-release housekeeping by @sgrebnov in #6951
Add missing param in release notes by @Advayp in #6959
Create comprehensive S3vectors test by @Jeadie in #6903
Update ROADMAP after v1.6 release by @sgrebnov in #6955
Update openapi.json by @app/github-actions in #6961
Add build step for new spiced images in end game template by @Jeadie in #6960
refactor: Use text search UDTF in v1/search by @peasee in #6962
Bump Jimver/cuda-toolkit from 0.2.26 to 0.2.27 by @app/dependabot in #6922
Bump notify from 8.0.0 to 8.2.0 by @app/dependabot in #6924
Use model2vec for search integration tests for speed by @Jeadie in #6971
feat: Add initial DuckDB regexp pushdown support by @peasee in #6966
Bump rustyline from 16.0.0 to 17.0.1 by @app/dependabot in #6976
Upgrade delta_kernel to 0.14 by @phillipleblanc in #6977
Consistent snapshots for mongodb by @krinart in #6974
Bump indexmap from 2.10.0 to 2.11.0 by @app/dependabot in #6921
Fix mongo tests: ignore container_registry() when building image name by @krinart in #6983
Implement support for s3 tables for glue DataConnector by @krinart in #6981
Bump serde_json from 1.0.142 to 1.0.143 by @app/dependabot in #6925
Update build_and_release macOS pipeline to skip updating cmake if installed by @phillipleblanc in #6998
Mark Kafka Data Connector Alpha quality by @sgrebnov in #6991
Add v1.6.1 release notes by @lukekim in #7000
Spice CLI trace: make error friendlier when task_history is disabled by @sgrebnov in #6996
Warn when runtime or management is added in spicepod dependency by @Jeadie in #6953
Enable .datasets[].vectors.params.s3_vectors_distance_metric for S3 Vectors by @Jeadie in #6982
Add s3_vectors index support for CDC and Append streams by @sgrebnov in #6986
Find all vector indexes in v1/search by @Jeadie in #7004
Fix RRF; reorder by score by @Jeadie in #7007
Fix for nested VectorScanTableProvider by @krinart in #7017
Add --sql flag to output SQL query for spice trace by @Jeadie in #7002
Make web search params engine-specific by @Advayp in #7022
Add more MTEB benchmark spicepods by @peasee in #7026
Improve error messaging in tools by @Jeadie in #6895
Add retry for exporting task history records by @sgrebnov in #7049
Increase DoPut write timeout for the next batch from 30 to 120 seconds by @sgrebnov in #7054
Avoid redundant search embedding by @peasee in #7053
Truncate text_embed task_history trace by @sgrebnov in #7050
Use the UTC offset for the start_time and end_time fields in the task history by @ewgenius in #7056
Update supported versions in SECURITY.md by @Jeadie in #7060
Add integration test for Kafka S3 Vectors by @sgrebnov in #6988
Enable parameters to enforce the value is one of several options by @Jeadie in #6984
feat(iceberg): lakekeeper catalog - add warehouse param to spicepod by @ChrisTomAlxHitachi in #6932
feat: Add HTTP query concurrency support to testoperator by @peasee in #7025
Ensure no data does not throw error in v1/search by @Jeadie in #7033
Bump github.com/spf13/cobra from 1.9.1 to 1.10.1 by @app/dependabot in #7013
Add QA analytics for 1.6.x releases by @sgrebnov in #7082
Use env variable for HF cache in model2vec by @Jeadie in #7076
chore: upgrade to Rust 1.88 by @kczimm in #7077
Kafka/Debezium: make common errors user-friendlier by @sgrebnov in #7084
Create Apache Datafusion upgrade issue template by @kczimm in #6800
No join predicate pushdown on empty results by @Jeadie in #7075
Bump tract-onnx from 0.21.10 to 0.22.0 by @app/dependabot in #7071
Bump mongodb from 3.2.4 to 3.3.0 by @app/dependabot in #7073
Bump indicatif from 0.17.11 to 0.18.0 by @app/dependabot in #7070
Bump actions/github-script from 7 to 8 by @app/dependabot in #7069
Bump actions/setup-go from 5 to 6 by @app/dependabot in #7068
Bump actions/download-artifact from 4 to 5 by @app/dependabot in #7066
Bedrock: Tool use without inputs must empty Document by @Jeadie in #7036
Bump github.com/stretchr/testify from 1.10.0 to 1.11.1 by @app/dependabot in #7015
Bump actions/setup-python from 5 to 6 by @app/dependabot in #7067
Upgrade dependabot dependencies by @phillipleblanc in #7061
Bump tempfile from 3.20.0 to 3.21.0 by @app/dependabot in #7018
Only call 'list_datasets' once, after initial system/user messages by @Jeadie in #7039
Bump github.com/spf13/pflag from 1.0.7 to 1.0.10 by @app/dependabot in #7062
Bump actions/checkout from 4 to 5 by @app/dependabot in #7065
Bump golang.org/x/mod from 0.27.0 to 0.28.0 by @app/dependabot in #7064
Bump github.com/AzureAD/microsoft-authentication-library-for-go from 1.4.1 to 1.5.0 by @app/dependabot in #7063
Add friendly message for Kafka operation timeout error, improve code by @sgrebnov in #7088
embed UDF by @mach-kernel in #6967
fix: Update benchmark snapshots by @app/github-actions in #7097
Fix SF100 benchmark tests dispatch by @sgrebnov in #7098
chore(logging): add log when iceberg rest catalog fails with ssl cert error by @ChrisTomAlxHitachi in #6909
Add xxhash support for search/sql results by @krinart in #6978
Use proper federation in max_timestamp_df during acceleration refresh by @krinart in #7055
Fix spiced_docker workflows for new actions/download-artifact@v5 behavior by @phillipleblanc in #7108
Fix spiced_docker workflow by @phillipleblanc in #7111
Add filter for zero vectors before writing to S3 Vectors by @phillipleblanc in #7110
Ensure we find vector index when it also has text search by @Jeadie in #7120
Enable unified traceparent override support for HTTP API by @sgrebnov in #7122
Fix ORDER BY: (BytesProcessedExec to avoid pruning ordered execs during physical optimization) by @mach-kernel in #7105
Fix spiced_docker_nightly workflow by @sgrebnov in #7125
Add output_level to runtime config by @krinart in #7119
Add tests for xxhash hashers by @krinart in #7124
Add input option to update snapshots in Integration tests by @Jeadie in #7127
Fix formatting to improve merges by @lukekim in #7128
Add tests to nulling logic by @Jeadie in #7113
Bump chrono from 0.4.41 to 0.4.42 by @app/dependabot in #7131
Bump ctrlc from 3.4.7 to 3.5.0 by @app/dependabot in #7132
Search: RRF UDTF by @mach-kernel in #7090
Update openapi.json by @app/github-actions in #7141
Bump packages to DF49; resolve incompatibilities by @Jeadie in #7101
fix: Don't error for chunked columns when vectors are disabled by @peasee in #7150
Allow bzip2-1.0.6 license in deny.toml by @Jeadie in #7148
Tune retry settings for Kafka/Debezium connectors by @sgrebnov in #7142
Update TEI by @Jeadie in #7152
Use twox-hash version 2.1.2 by @krinart in #7165
Revert "Use proper federation in max_timestamp_df during acceleration refresh (#7055)" by @phillipleblanc in #7156
Bump octocrab from 0.44.1 to 0.45.0 by @app/dependabot in #7158
Bump github.com/spf13/viper from 1.19.0 to 1.21.0 by @app/dependabot in #7130
Bump keyring from 3.6.2 to 3.6.3 by @app/dependabot in #7157
fix: Remove keywords from AI document search by @peasee in #7052
Bump tract-core from 0.21.10 to 0.22.0 by @app/dependabot in #7134
Update TEI by @Jeadie in #7171
Update openapi.json by @app/github-actions in #7172
fix: Ensure vector search UDTF respects the supplied projection by @peasee in #7155
Bump clap from 4.5.45 to 4.5.47 by @app/dependabot in #7135
Bump golang.org/x/sys from 0.35.0 to 0.36.0 by @app/dependabot in #7129
Include 'catalog_id' in Glue catalog parameters by @Jeadie in #7151
fix: Use head ref from merge group event in pulls-with-spice concurrency group by @peasee in #7175
Fix lint for xxhash feature by @phillipleblanc in #7176
Add Kafka-specific metrics for consumer lag and consumed records by @sgrebnov in #7146
Kafka: persist consumer between restarts with SQLite and PG acceleration by @sgrebnov in #7177
Kafka: support specifying a target consumer group ID by @sgrebnov in #7178
Fix timestamp parsing for spice trace by @krinart in #7173
Support full-text indexing on CDC/append streams by @phillipleblanc in #7180
Bump iceberg-rust version to include limit push down by @krinart in #7191
Make full text stream connector more robust by @phillipleblanc in #7193
fix: Update benchmark snapshots by @app/github-actions in #7179
Initial changes for SearchIndex by @Jeadie in #7103
Robustly handle indexing FTS for CDC streams by @phillipleblanc in #7197
Proper handling/mapping for ThrottlingException during embedding calls by @krinart in #7170
Add spicepod.yml by @lukekim in #7202
Delta Lake: Support read pruning on timestamp columns using maxValues stats by @sgrebnov in #7203
feat: Add initial embeddings cache by @peasee in #7194
Make S3vector a FixedSizeListArray by @Jeadie in #7201
Fix projection mismatch issues with RRF calling vector search / text search by @mach-kernel in #7200
feat: Add embeddings cache to all embeddings by @peasee in #7204
Revert "Make S3vector a FixedSizeListArray (#7201)" by @kczimm in #7210
Update duckdb version to make ICU statically linked by default by @krinart in #7215
Change DataType list nullability from true to false by @Jeadie in #7216
Use Instant + saturating_sub to handle time drift by @krinart in #7212
Flatten 'IndexedTableProvider' when adding full-text support by @Jeadie in #7219
Include comments in pulls by @lukekim in #7224
Add github_max_concurrent_connections = 5 by @lukekim in #7225
RRF: Fix scoring by @mach-kernel in #7226
Update RRF search integration snapshots after scoring change by @mach-kernel in #7227
Make S3vector a FixedSizeListArray by @Jeadie in #7230
Proper federation during acceleration refresh + datafusion version bump + integration tests by @krinart in #7228
Use DuckDBDialect for DuckDB non-federated queries by @krinart in #7232
Move chunking out of llms and into new crate chunking by @Jeadie in #7229
Remove duplicate pg_port configuration in test by @lukekim in #7233
Upgrade to Rust 1.89 by @phillipleblanc in #7235
Catalog connection error: fix connector name from 'iceberg' to 'spice.ai' by @sgrebnov in #7240
Create PutVectorsSink by @kczimm in #7199
Benchmark tests: fix API key reference in spicecloud catalog by @sgrebnov in #7239
Add Dotnet SDK sample to end game template by @sgrebnov in #7238
Update spicepod.schema.json by @app/github-actions in #7254
Postgres: Improve Decimals read performance and add Name type support by @sgrebnov in #7255
Add tests for hybrid search on a vector engine by @Jeadie in #7220

Spice v1.4.0 (June 18, 2025)

June 19, 2025 · 19 min read

William Croxson

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.4.0! ⚡

This release upgrades DataFusion to v47 and Arrow to v55 for faster queries, more efficient Parquet/CSV handling, and improved reliability. It introduces the AWS Glue Catalog and Data Connectors for native access to Glue-managed data on S3, and adds support for Databricks U2M OAuth for secure Databricks user authentication.

New Cron-based dataset refreshes and worker schedules enable automated task management, while dataset and search results caching improvements further optimizes query, search, and RAG performance.

What's New in v1.4.0

DataFusion v47 Highlights

Spice.ai is built on the DataFusion query engine. The v47 release brings:

Performance Improvements 🚀: This release delivers major query speedups through specialized GroupsAccumulator implementations for first_value, last_value, and min/max on Duration types, eliminating unnecessary sorting and computation. TopK operations are now up to 10x faster thanks to early exit optimizations, while sort performance is further enhanced by reusing row converters, removing redundant clones, and optimizing sort-preserving merge streams. Logical operations benefit from short-circuit evaluation for AND/OR, reducing overhead, and additional enhancements address high latency from sequential metadata fetching, improve int/string comparison efficiency, and simplify logical expressions for better execution.

Bug Fixes & Compatibility Improvements 🛠️: The release addresses issues with external sort, aggregation, and window functions, improves handling of NULL values and type casting in arrays and binary operations, and corrects problems with complex joins and nested window expressions. It also addresses SQL unparsing for subqueries, aliases, and UNION BY NAME.

See the Apache DataFusion 47.0.0 Changelog for details.

Arrow v55 Highlights

Arrow v55 delivers faster Parquet gzip compression, improved array concatenation, and better support for large files (4GB+) and modular encryption. Parquet metadata reads are now more efficient, with support for range requests and enhanced compatibility for INT96 timestamps and timezones. CSV parsing is more robust, with clearer error messages. These updates boost performance, compatibility, and reliability.

See the Arrow 55.0.0 Changelog and Arrow 55.1.0 Changelog for details.

Runtime Highlights

Search Result Caching: Spice now supports runtime caching for search results, improving performance for subsequent searches and chat completion requests that use the document_similarity LLM tool. Caching is configurable with options like maximum size, item TTL, eviction policy, and hashing algorithm.

Example spicepod.yml configuration:

runtime:
  caching:
    search_results:
      enabled: true
      max_size: 128mb
      item_ttl: 5s
      eviction_policy: lru
      hashing_algorithm: siphash

For more information, refer to the Caching documentation.

AWS Glue Catalog Connector Alpha: Connect to AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV tables in S3.

Example spicepod.yml configuration:

catalogs:
  - from: glue
    name: my_glue_catalog
    params:
      glue_key: <your-access-key-id>
      glue_secret: <your-secret-access-key>
      glue_region: <your-region>
    include:
      - 'testdb.hive_*'
      - 'testdb.iceberg_*'

sql> show tables;
+-----------------+--------------+-------------------+------------+
| table_catalog   | table_schema | table_name        | table_type |
+-----------------+--------------+-------------------+------------+
| my_glue_catalog | testdb       | hive_table_001    | BASE TABLE |
| my_glue_catalog | testdb       | iceberg_table_001 | BASE TABLE |
| spice           | runtime      | task_history      | BASE TABLE |
+-----------------+--------------+-------------------+------------+

For more information, refer to the Glue Catalog Connector documentation.

AWS Glue Data Connector Alpha: Connect to specific tables in AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV in S3.

Example spicepod.yml configuration:

datasets:
  - from: glue:my_database.my_table
    name: my_table
    params:
      glue_auth: key
      glue_region: us-east-1
      glue_key: ${secrets:AWS_ACCESS_KEY_ID}
      glue_secret: ${secrets:AWS_SECRET_ACCESS_KEY}

For more information, refer to the Glue Data Connector documentation.

Databricks U2M OAuth: Spice now supports User-to-Machine (U2M) authentication for Databricks when called with a compatible client, such as the Spice Cloud Platform.

datasets:
  - from: databricks:spiceai_sandbox.default.messages
    name: messages
    params:
      databricks_endpoint: ${secrets:DATABRICKS_ENDPOINT}
      databricks_cluster_id: ${secrets:DATABRICKS_CLUSTER_ID}
      databricks_client_id: ${secrets:DATABRICKS_CLIENT_ID}

Dataset Refresh Schedules: Accelerated datasets now support a refresh_cron parameter, automatically refreshing the dataset on a defined cron schedule. Cron scheduled refreshes respect the global dataset_refresh_parallelism parameter.

Example spicepod.yml configuration:

datasets:
  - name: my_dataset
    from: s3://my-bucket/my_file.parquet
    acceleration:
      refresh_cron: 0 0 * * * # Daily refresh at midnight

For more information, refer to the Dataset Refresh Schedules documentation.

Worker Execution Schedules: Workers now support a cron parameter and will execute an LLM-prompt or SQL query automatically on the defined cron schedule, in conjunction with a provided params.prompt.

Example spicepod.yml configuration:

workers:
  - name: email_reporter
    models:
      - from: gpt-4o
    params:
      prompt: 'Inspect the latest emails, and generate a summary report for them. Post the summary report to the connected Teams channel'
    cron: 0 2 * * * # Daily at 2am

For more information, refer to the Worker Execution Schedules documentation.

SQL Worker Actions: Spice now supports workers with sql actions for automated SQL query execution on a cron schedule:

workers:
  - name: my_worker
    cron: 0 * * * *
    sql: 'SELECT * FROM lineitem'

For more information, refer to the Workers with a SQL action documentation;

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

Added Glue Catalog Connector and Data Connector cookbooks: Connect to tables and databases in the AWS Glue Data catalog.
Added Cron-based Dataset Refresh: Refresh datasets on defined schedules.

The Spice Cookbook now includes 70 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.4.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.4.0 image:

docker pull spiceai/spiceai:1.4.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

DataFusion: Upgraded to v47
arrow-rs: Upgraded to v55.1.0
delta_kernel: Upgraded to v0.11.0

Changelog

Update trunk to 1.4.0-unstable (#5878) by @phillipleblanc in #5878
Update openapi.json (#5885) by @app/github-actions in #5885
feat: Testoperator reports benchmark failure summary (#5889) by @peasee in #5889
fix: Publish binaries to dev when platform option is all (#5905) by @peasee in #5905
feat: Print dispatch current test count of total (#5906) by @peasee in #5906
Include multiple duckdb files acceleration scenarios into testoperator dispatch (#5913) by @sgrebnov in #5913
feat: Support building testoperator on dev (#5915) by @peasee in #5915
Update spicepod.schema.json (#5927) by @app/github-actions in #5927
Update ROADMAP & SECURITY for 1.3.0 (#5926) by @phillipleblanc in #5926
docs: Update qa_analytics.csv (#5928) by @peasee in #5928
fix: Properly publish binaries to dev on push (#5931) by @peasee in #5931
Load request context extensions on every flight incoming call (#5916) by @ewgenius in #5916
Fix deferred loading for datasets with embeddings (#5932) by @ewgenius in #5932
Schedule AI benchmarks to run every Mon and Thu evening PST (#5940) by @sgrebnov in #5940
Fix explain plan snapshots for TPCDS queries Q36, Q70 & Q86 not being deterministic after DF 46 upgrade (#5942) by @phillipleblanc in #5942
chore: Upgrade to Rust 1.86 (#5945) by @peasee in #5945
Standardise HTTP settings across CLI (#5769) by @Jeadie in #5769
Fix deferred flag for Databricks SQL warehouse mode (#5958) by @ewgenius in #5958
Add deferred catalog loading (#5950) by @ewgenius in #5950
Refactor deferred_load using ComponentInitialization enum for better clarity (#5961) by @ewgenius in #5961
Post-release housekeeping (#5964) by @phillipleblanc in #5964
add LTO for release builds (#5709) by @kczimm in #5709
Fix dependabot/192 (#5976) by @Jeadie in #5976
Fix Test-to-SQL benchmark scheduled run (#5977) by @sgrebnov in #5977
Fix JSON to ScalarValue type conversion to match DataFusion behavior (#5979) by @sgrebnov in #5979
Add v1.3.1 release notes (#5978) by @lukekim in #5978
Regenerate nightly build workflow (#5995) by @ewgenius in #5995
Fix DataFusion dependency loading in Databricks request context extension (#5987) by @ewgenius in #5987
Update spicepod.schema.json (#6000) by @app/github-actions in #6000
feat: Run MySQL SF100 on dev runners (#5986) by @peasee in #5986
fix: Remove caching RwLock (#6001) by @peasee in #6001
1.3.1 Post-release housekeeping (#6002) by @phillipleblanc in #6002
feat: Add initial scheduler crate (#5923) by @peasee in #5923
fix flight request context scope (#6004) by @ewgenius in #6004
fix: Ensure snapshots on different scale factors are retained (#6009) by @peasee in #6009
fix: Allow dev runners in dispatch files (#6011) by @peasee in #6011
refactor: Deprecate results_cache for caching.sql_results (#6008) by @peasee in #6008
Fix models benchmark results reporting (#6013) by @sgrebnov in #6013
fix: Run PR checks for tools/ changes (#6014) by @peasee in #6014
feat: Add a CronRequestChannel for scheduler (#6005) by @peasee in #6005
feat: Add refresh_cron acceleration parameter, start scheduler on table load (#6016) by @peasee in #6016
Update license check to allow dual license crates (#6021) by @sgrebnov in #6021
Initial worker concept (#5973) by @Jeadie in #5973
Don't fail if cargo-deny already installed (license check) (#6023) by @sgrebnov in #6023
Upgrade to DataFusion 47 and Arrow 55 (#5966) by @sgrebnov in #5966
Read Iceberg tables from Glue Catalog Connector (#5965) by @kczimm in #5965
Handle multiple highlights in v1/search UX (#5963) by @Jeadie in #5963
feat: Add cron scheduler configurations for workers (#6033) by @peasee in #6033
feat: Add search cache configuration and results wrapper (#6020) by @peasee in #6020
Fix GitHub Actions Ubuntu for more workflows (#6040) by @phillipleblanc in #6040
Fix Actions for testoperator dispatch manual (#6042) by @phillipleblanc in #6042
refactor: Remove worker type (#6039) by @peasee in #6039
feat: Support cron dataset refreshes (#6037) by @peasee in #6037
Upgrade datafusion-federation to 0.4.2 (#6022) by @phillipleblanc in #6022
Define SearchPipeline and use in runtime/vector_search.rs. (#6044) by @Jeadie in #6044
fix: Scheduler test when scheduler is running (#6051) by @peasee in #6051
doc: Spice Cloud Connector Limitation (#6035) by @Sevenannn in #6035
Add support for on_conflict:upsert for Arrow MemTable (#6059) by @sgrebnov in #6059
Enhance Arrow Flight DoPut operation tracing (#6053) by @sgrebnov in #6053
Update openapi.json (#6032) by @app/github-actions in #6032
Add tools enabled to MCP server capabilities (#6060) by @Jeadie in #6060
Upgrade to delta_kernel 0.11 (#6045) by @phillipleblanc in #6045
refactor: Replace refresh oneshot with notify (#6050) by @peasee in #6050
Enable Upsert OnConflictBehavior for runtime.task_history table (#6068) by @sgrebnov in #6068
feat: Add a workers integration test (#6069) by @peasee in #6069
Fix DuckDB acceleration ORDER BY rand() and ORDER BY NULL (#6071) by @phillipleblanc in #6071
Update Models Benchmarks to report unsuccessful evals as errors (#6070) by @sgrebnov in #6070
Revert: fix: Use HTTPS ubuntu sources (#6082) by @Sevenannn in #6082
Add initial support for Spice Cloud Platform management (#6089) by @sgrebnov in #6089
Run spiceai cloud connector TPC tests using spice dev apps (#6049) by @Sevenannn in #6049
feat: Add SQL worker action (#6093) by @peasee in #6093
Post-release housekeeping (#6097) by @phillipleblanc in #6097
Fix search bench (#6091) by @Jeadie in #6091
fix: Update benchmark snapshots (#6094) by @app/github-actions in #6094
fix: Update benchmark snapshots (#6095) by @app/github-actions in #6095
Glue catalog connector for hive style parquet (#6054) by @kczimm in #6054
Update openapi.json (#6100) by @app/github-actions in #6100
Improve Flight Client DoPut / Publish error handling (#6105) by @sgrebnov in #6105
Define PostApplyCandidateGeneration to handle all filters & projections. (#6096) by @Jeadie in #6096
refactor: Update the tracing task names for scheduled tasks (#6101) by @peasee in #6101
task: Switch GH runners in PR and testoperator (#6052) by @peasee in #6052
feat: Connect search caching for HTTP and tools (#6108) by @peasee in #6108
test: Add multi-dataset cron test (#6102) by @peasee in #6102
Sanitize the ListingTableURL (#6110) by @phillipleblanc in #6110
Avoid partial writes by FlightTableWriter (#6104) by @sgrebnov in #6104
fix: Update the TPCDS postgres acceleration indexes (#6111) by @peasee in #6111
Make Glue Catalog refreshable (#6103) by @kczimm in #6103
Refactor Glue catalog to use a new Glue data connector (#6125) by @kczimm in #6125
Emit retry error on flight transient connection failure (#6123) by @Sevenannn in #6123
Update Flight DoPut implementation to send single final PutResult (#6124) by @sgrebnov in #6124
feat: Add metrics for search results cache (#6129) by @peasee in #6129
update MCP crate (#6130) by @Jeadie in #6130
feat: Add search cache status header, respect cache control (#6131) by @peasee in #6131
fix: Allow specifying individual caching blocks (#6133) by @peasee in #6133
Update openapi.json (#6132) by @app/github-actions in #6132
Add CSV support to Glue data connector (#6138) by @kczimm in #6138
Update Spice Cloud Platform management UX (#6140) by @sgrebnov in #6140
Add TPCH bench for Glue catalog (#6055) by @kczimm in #6055
Enforce max_tokens_per_request limit in OpenAI embedding logic (#6144) by @sgrebnov in #6144
Enable Spice Cloud Control Plane connect (management) for FinanceBench (#6147) by @sgrebnov in #6147
Add integration test for Spice Cloud Platform management (#6150) by @sgrebnov in #6150
fix: Invalidate search cache on refresh (#6137) by @peasee in #6137
fix: Prevent registering cron schedule with change stream accelerations (#6152) by @peasee in #6152
test: Add an append cron integration test (#6151) by @peasee in #6151
fix: Cache search results with no-cache directive (#6155) by @peasee in #6155
fix: Glue catalog dispatch runner type (#6157) by @peasee in #6157
Fix: Glue S3 location for directories and Iceberg credentials (#6174) by @kczimm in #6174
Support multiple columns in FTS (#6156) by @Jeadie in #6156
fix: Add --cache-control flag for search CLI (#6158) by @peasee in #6158
Add Glue data connector tpch bench test for parquet and csv (#6170) by @kczimm in #6170
fix: Apply results cache deprecation correctly (#6177) by @peasee in #6177
Fix regression in Parquet pushdown (#6178) by @phillipleblanc in #6178
Fix CUDA build (use candle-core 0.8.4 and cudarc v0.12) (#6181) by @sgrebnov in #6181
return empty stream if no external_links present (#6192) by @kczimm in #6192
Use arrow pretty print util instead of init dataframe / logical plan in display_records (#6191) by @Sevenannn in #6191
task: Enable additional TPCDS test scenarios in dispatcher (#6160) by @peasee in #6160
chore: Update dependencies (#6196) by @peasee in #6196
Fix FlightSQL GetDbSchemas and GetTables schemas to fully match the protocol (#6197) by @sgrebnov in #6197
Use spice-rs in test operator and retry on connection reset error (#6136) by @Sevenannn in #6136
Fix load status metric description (#6219) by @phillipleblanc in #6219
Run extended tests on PRs against release branch, update glue_iceberg_integration_test_catalog test (#6204) by @Sevenannn in #6204
query schema for is_nullable (#6229) by @kczimm in #6229
fix: use the query error message when queries fail (#6228) by @kczimm in #6228
fix glue iceberg catalog integration test (#6249) by @Sevenannn in #6249
cache table providers in glue catalog (#6252) by @kczimm in #6252
fix: databricks sql_warehouse schema contains duplicate fields (#6255) by @phillipleblanc in #6255

Full Changelog: v1.3.2...v1.4.0

Spice v1.3.0 (May 19, 2025)

May 20, 2025 · 9 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.3.0! 🏎️

Spice v1.3.0 accelerates data and AI applications with significantly improved query performance, reliability, and expanded Databricks integration. New support for the Databricks SQL Statement Execution API enables direct SQL queries on Databricks SQL Warehouses, complementing Mosaic AI model serving and embeddings (introduced in v1.2.2) and existing Databricks catalog and dataset integrations. This release upgrades to DataFusion v46, optimizes results caching performance, and strengthens security with least-privilege sandboxed improvements.

What's New in v1.3.0

Databricks SQL Statement Execution API Support: Added support for the Databricks SQL Statement Execution API, enabling direct SQL queries against Databricks SQL Warehouses for optimized performance in analytics and reporting workflows.

Example spicepod.yml configuration:

datasets:
  - from: databricks:spiceai.datasets.my_awesome_table
    name: my_awesome_table
    params:
      mode: sql_warehouse
      databricks_endpoint: ${env:DATABRICKS_ENDPOINT}
      databricks_sql_warehouse_id: ${env:DATABRICKS_SQL_WAREHOUSE_ID}
      databricks_token: ${env:DATABRICKS_TOKEN}

For details, see the Databricks Data Connector documentation.

Improved Results Cache Performance & Hashing Algorithm: Spice now supports an alternative results cache hashing algorithm, ahash, in addition to siphash, being the default. Configure it via:
```
runtime:
  results_cache:
    hashing_algorithm: ahash # or siphash
```
The hashing algorithm determines how cache keys are hashed before being stored, impacting both lookup speed and protection against potential DOS attacks.

Using ahash improves performance for large queries or query plans. Combined with results cache optimizations, it reduces 99th percentile request latency and increases total requests/second for queries with large result sets (100k+ cached rows). The following charts show performance tested against the TPCH Query #17 on a scale factor 5 dataset (30+ million rows, 5GB):

Latency Req/sec

Note: ahash was not available in v1.2.2, so it is excluded from comparisons.

To learn more, refer to the Results Cache Hashing Algorithm documentation.
SQL Query Performance: Optimized the critical SQL query path, reducing overhead and improving response times for simple queries by 10-20%.
DuckDB Acceleration: Fixed a bug in the DuckDB acceleration engine causing query failures under high concurrency when querying datasets accelerated into multiple DuckDB files.
Container Security: The container image now runs as a non-root user with enhanced sandboxing and includes only essential dependencies for a slimmer, more secure image.

DataFusion v46 Highlights

Spice.ai is built on the DataFusion query engine. The v46 release brings:

Faster Performance 🚀: DataFusion 46 introduces significant performance enhancements, including a 2x faster median() function for large datasets without grouping, 10–100% speed improvements in FIRST_VALUE and LAST_VALUE window functions by avoiding sorting, and a 40x faster uuid() function. Additional optimizations, such as a 50% faster repeat() string function, accelerated chr() and to_hex() functions, improved grouping algorithms, and Parquet row group pruning with NOT LIKE filters, further boost overall query efficiency.
New range() Table Function: A new table-valued function range(start, stop, step) has been added to make it easy to generate integer sequences — similar to PostgreSQL’s generate_series() or Spark’s range(). Example: SELECT * FROM range(1, 10, 2);
UNION [ALL | DISTINCT] BY NAME Support: DataFusion now supports UNION BY NAME and UNION ALL BY NAME, which align columns by name instead of position. This matches functionality found in systems like Spark and DuckDB and simplifies combining heterogeneously ordered result sets.

Example:
```
SELECT col1, col2 FROM t1
UNION ALL BY NAME
SELECT col2, col1 FROM t2;
```

See the DataFusion 46.0.0 release notes for details.

Spice.ai adopts the latest minus one DataFusion release for quality assurance and stability. The upgrade to DataFusion v47 is planned for Spice v1.4.0 in June.

Contributors

Breaking Changes

The container image now always runs as a non-root user (UID/GID 65534) with minimal dependencies, resulting in a smaller, more secure image. Standard Linux tools, including bash, are no longer included.

Kubernetes Deployments:

Use of the v1.3.0+ Helm chart is required, which includes a securityContext ensuring the sandbox user has required file access.
For deployments using a lower version than the v1.3.0 Helm chart, add the following securityContext to the pod specification:

securityContext:
  runAsUser: 65534
  runAsGroup: 65534
  fsGroup: 65534

See the Docker Sandbox Guide for details on how to update custom Docker images to restore the previous behavior.

Cookbook Updates

Added Accelerated Views: Pre-calculate and materialize data derived from one or more underlying datasets.

The Spice Cookbook now includes 67 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.3.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.3.0 image:

docker pull spiceai/spiceai:1.3.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

DataFusion: Upgraded to v46
Apache Arrow: Upgraded to v54.3.0
delta_kernel: Upgraded to v0.10.0

Changelog

update to 1.2.2 by @Jeadie in #5806
Move sandboxing logic to Dockerfile by @phillipleblanc in #5808
Add note to run installation health workflow after release is marked as official by @Sevenannn in #5797
ROADMAP updates May 13, 2025 by @lukekim in #5809
Update qa_analytics.csv by @kczimm in #5810
post-release housekeeping by @Jeadie in #5811
Fix flaky DataBricks M2M integration tests by @phillipleblanc in #5818
Add DataFusion request context extension to http routes by @ewgenius in #5807
Use Utf8 for partition columns by @phillipleblanc in #5820
Use full path for location metadata column by @phillipleblanc in #5819
Remove the DataFusion reference from the flight service and use the reference from the request context instead by @ewgenius in #5821
Upgrade delta_kernel to 0.10 by @phillipleblanc in #5823
fix: Update benchmark snapshots by @app/github-actions in #5827
Update qa_analytics.csv by @kczimm in #5824
fix: Update benchmark snapshots by @app/github-actions in #5826
fix: Update benchmark snapshots by @app/github-actions in #5825
Fix dispatch spicepod reference for file[parquet]-duckdb[file]-indexes and file[parquet]-duckdb[memory]-indexes by @phillipleblanc in #5837
Fix spice run --http-endpoint in CLI by @Jeadie in #5812
Prevent excessively copying RawCacheKey by @peasee in #5838
Make DuckDB database attachments logic more robust by @sgrebnov in #5839
Simplify Databricks U2M auth flow, by moving user auth to the request context by @ewgenius in #5842
Update to new MCP crate by @Jeadie in #5758
Disable the query tracker when task history is disabled by @peasee in #5852
Set fsGroup on PodSpec to force volumes to be mounted with permission to docker image by @phillipleblanc in #5854
Clarify Helm release steps by @phillipleblanc in #5855
Avoid cloning cached results by @peasee in #5853
Upgrade to DataFusion 46 by @phillipleblanc in #5543
Update openapi.json by @app/github-actions in #5856
Adapt to Arrow 54 changes in Dict IDs preserving (Arrow IPC) by @sgrebnov in #5866
fix: Update benchmark snapshots by @app/github-actions in #5867
Fix s3[parquet]-duckdb[file-many] benchmark Spicepod configuration by @sgrebnov in #5868
fix: Update benchmark snapshots by @app/github-actions in #5869
feat: Refactor caching, support hashing algorithms by @peasee in #5859
Overried health checks for Databricks models in U2M auth mode by @ewgenius in #5858
Update trunk to 1.4.0-unstable by @phillipleblanc in #5878
fix: Pass parameters to testoperator explain plan by @peasee in #5883
Disallow schema updates for existing accelerated tables by @phillipleblanc in #5887
Deferrable registration for Databricks U2M datasets by @ewgenius in #5860

See the full list of changes at: v1.2.2...v1.3.0

Distribution Changes​

What's New in v2.0.0-rc.1​

Active-Active HA Distributed Query​

Spice Cayenne Improvements​

DataFusion v52.2.0 Upgrade​

DDL Support for Iceberg and Cayenne​

DuckLake Catalog & Data Connector​

GCS Data Connector (Alpha)​

Rust CLI Rewrite​

Models Included by Default​

Error Propagation for Dataset and Model Status APIs​

Additional Dependency Upgrades​

Other Improvements​

Spicepod v1 to v2 Changes​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.11.0​

Spice Cayenne Accelerator Reaches Beta​

DataFusion v51 Upgrade​

Arrow 57.2 Upgrade​

DynamoDB Connector Enhancements​

Distributed Query Improvements​

iceberg-rust v0.8.0 Upgrade​

Acceleration Snapshots​

Caching Acceleration Mode Improvements​

Prepared Statements​

Spice Java SDK v0.5.0​

Google LLM Support​

URL Tables​

Cluster Mode Async Query APIs (experimental)​

OpenTelemetry Improvements​

Observability Improvements​

Hash Indexing for Arrow Acceleration (experimental)​

SMB and NFS Data Connectors​

ScyllaDB Data Connector​

Flight SQL TLS Connection Fixes​

Developer Experience Improvements​

Additional Improvements & Bug Fixes​

Contributors​

Breaking Changes​

OTel Ingestion Port Change​

Distributed Query Cluster Mode Requires mTLS​

Cookbook Updates​

Upgrading​

Dependencies​

What's Changed​

Changelog​

What's New in v1.11.0-rc.2​

Spice Cayenne Accelerator Reaches Beta​

DataFusion v51 Upgrade​

Arrow 57.2 Upgrade​

iceberg-rust v0.8.0 Upgrade​

Acceleration Snapshots​

ScyllaDB Data Connector​

Distributed Query Improvements​

Caching Acceleration Mode Improvements​

DynamoDB Connector Enhancements​

URL Tables​

Cluster Mode Async Query APIs (experimental)​

Observability Improvements​

Additional Improvements​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

Dependencies​

Changelog​

What's New in v1.9.0​

Cayenne Data Accelerator (Beta)​

Multi-Node Distributed Query (Preview)​

DataFusion v50 Upgrade​

DuckDB v1.4.2 Upgrade and Accelerator Improvements​

HTTP Data Connector​

DynamoDB Data Connector Improvements​

S3 Data Connector Improvements​

Search & Embeddings Enhancements​

Distribution Changes

What's New in v2.0.0-rc.1

Active-Active HA Distributed Query

Spice Cayenne Improvements

DataFusion v52.2.0 Upgrade

DDL Support for Iceberg and Cayenne

DuckLake Catalog & Data Connector

GCS Data Connector (Alpha)

Rust CLI Rewrite

Models Included by Default

Error Propagation for Dataset and Model Status APIs

Additional Dependency Upgrades

Other Improvements

Spicepod v1 to v2 Changes

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.11.0

Spice Cayenne Accelerator Reaches Beta

DataFusion v51 Upgrade

Arrow 57.2 Upgrade

DynamoDB Connector Enhancements

Distributed Query Improvements

iceberg-rust v0.8.0 Upgrade

Acceleration Snapshots

Caching Acceleration Mode Improvements

Prepared Statements

Spice Java SDK v0.5.0

Google LLM Support

URL Tables

Cluster Mode Async Query APIs (experimental)

OpenTelemetry Improvements

Observability Improvements

Hash Indexing for Arrow Acceleration (experimental)

SMB and NFS Data Connectors

ScyllaDB Data Connector

Flight SQL TLS Connection Fixes

Developer Experience Improvements

Additional Improvements & Bug Fixes

Contributors

Breaking Changes

OTel Ingestion Port Change

Distributed Query Cluster Mode Requires mTLS

Cookbook Updates

Upgrading

Dependencies

What's Changed

Changelog

What's New in v1.11.0-rc.2

Spice Cayenne Accelerator Reaches Beta

DataFusion v51 Upgrade

Arrow 57.2 Upgrade

iceberg-rust v0.8.0 Upgrade

Acceleration Snapshots

ScyllaDB Data Connector

Distributed Query Improvements

Caching Acceleration Mode Improvements

DynamoDB Connector Enhancements

URL Tables

Cluster Mode Async Query APIs (experimental)

Observability Improvements

Additional Improvements

Contributors

Breaking Changes

Cookbook Updates

Upgrading

Dependencies

Changelog

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Multi-Node Distributed Query (Preview)

DataFusion v50 Upgrade

DuckDB v1.4.2 Upgrade and Accelerator Improvements

HTTP Data Connector

DynamoDB Data Connector Improvements

S3 Data Connector Improvements

Search & Embeddings Enhancements