Skip to main content

13 posts tagged with "datafusion"

DataFusion query engine related topics and usage

View All Tags

Spice v1.11.0 (Jan 28, 2026)

· 58 min read
William Croxson
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.11.0-stable! ⚡

In Spice v1.11.0, Spice Cayenne reaches Beta status with acceleration snapshots, Key-based deletion vectors, and Amazon S3 Express One Zone support. DataFusion has been upgraded to v51 along with Arrow v57.2, and iceberg-rust v0.8.0. v1.11 adds several DynamoDB & DynamoDB Streams improvements such as JSON nesting, and adds significant improvements to Distributed Query with active-active schedulers and mTLS for enterprise-grade high-availability and secure cluster communication.

This release also adds new SMB, NFS, and ScyllaDB Data Connectors (Alpha), Prepared Statements with full SDK support (gospice, spice-rs, spice-dotnet, spice-java, spice.js, and spicepy), Google LLM Support for expanded AI inference capabilities, and significant improvements to caching, observability, and Hash Indexing for Arrow Acceleration.

What's New in v1.11.0

Spice Cayenne Accelerator Reaches Beta

Spice Cayenne has been promoted to Beta status with acceleration snapshots support and numerous performance and stability improvements.

Key Enhancements:

  • Key-based Deletion Vectors: Improved deletion vector support using key-based lookups for more efficient data management and faster delete operations. Key-based deletion vectors are more memory-efficient than positional vectors for sparse deletions.
  • S3 Express One Zone Support: Store Cayenne data files in S3 Express One Zone for single-digit millisecond latency, ideal for latency-sensitive query workloads that require persistence.

Improved Reliability:

  • Resolved FuturesUnordered reentrant drop crashes
  • Fixed memory growth issues related to Vortex metrics allocation
  • Metadata catalog now properly respects cayenne_file_path location
  • Added warnings for unparseable configuration values

For more details, refer to the Cayenne Documentation.

DataFusion v51 Upgrade

Apache DataFusion has been upgraded to v51, bringing significant performance improvements, new SQL features, and enhanced observability.

DataFusion v51 ClickBench Performance

Performance Improvements:

  • Faster CASE Expression Evaluation: Expressions now short-circuit earlier, reuse partial results, and avoid unnecessary scattering, speeding up common ETL patterns
  • Better Defaults for Remote Parquet Reads: DataFusion now fetches the last 512KB of Parquet files by default, typically avoiding 2 I/O requests per file
  • Faster Parquet Metadata Parsing: Leverages Arrow 57's new thrift metadata parser for up to 4x faster metadata parsing

New SQL Features:

  • SQL Pipe Operators: Support for |> syntax for inline transforms
  • DESCRIBE <query>: Returns the schema of any query without executing it
  • Named Arguments in SQL Functions: PostgreSQL-style param => value syntax for scalar, aggregate, and window functions
  • Decimal32/Decimal64 Support: New Arrow types supported including aggregations like SUM, AVG, and MIN/MAX

Example pipe operator:

SELECT * FROM t
|> WHERE a > 10
|> ORDER BY b
|> LIMIT 5;

Improved Observability:

  • Improved EXPLAIN ANALYZE Metrics: New metrics including output_bytes, selectivity for filters, reduction_factor for aggregates, and detailed timing breakdowns

Arrow 57.2 Upgrade

Apache Arrow has been upgraded to v57.2, bringing major performance improvements and new capabilities.

Arrow 57 Parquet Metadata Parsing Performance

Key Features:

  • 4x Faster Parquet Metadata Parsing: A rewritten thrift metadata parser delivers up to 4x faster metadata parsing, especially beneficial for low-latency use cases and files with large amounts of metadata
  • Parquet Variant Support: Experimental support for reading and writing the new Parquet Variant type for semi-structured data, including shredded variant values
  • Parquet Geometry Support: Read and write support for Parquet Geometry types (GEOMETRY and GEOGRAPHY) with GeospatialStatistics
  • New arrow-avro Crate: Efficient conversion between Apache Avro and Arrow RecordBatches with projection pushdown and vectorized execution support

DynamoDB Connector Enhancements

  • Added JSON nesting for DynamoDB Streams
  • Improved batch deletion handling

Distributed Query Improvements

High Availability Clusters: Spice now supports running multiple active schedulers in an active/active configuration for production deployments. This eliminates the scheduler as a single point of failure and enables graceful handling of node failures.

  • Multiple schedulers run simultaneously, each capable of accepting queries
  • Schedulers coordinate via a shared S3-compatible object store
  • Executors discover all schedulers automatically
  • A load balancer distributes client queries across schedulers

Example HA configuration:

runtime:
scheduler:
state_location: s3://my-bucket/spice-cluster
params:
region: us-east-1

mTLS Verification: Cluster communication between scheduler and executors now supports mutual TLS verification for enhanced security.

Credential Propagation: S3, ABFS, and GCS credentials are now automatically propagated to executors in cluster mode, enabling access to cloud storage across the distributed query cluster.

Improved Resilience:

  • Exponential backoff for scheduler disconnection recovery
  • Increased gRPC message size limit from 16MB to 100MB for large query plans
  • HTTP health endpoint for cluster executors
  • Automatic executor role inference when --scheduler-address is provided

For more details, refer to the Distributed Query Documentation.

iceberg-rust v0.8.0 Upgrade

Spice has been upgraded to iceberg-rust v0.8.0, bringing improved Iceberg table support.

Key Features:

  • V3 Metadata Support: Full support for Iceberg V3 table metadata format
  • INSERT INTO Partitioned Tables: DataFusion integration now supports inserting data into partitioned Iceberg tables
  • Improved Delete File Handling: Better support for position and equality delete files, including shared delete file loading and caching
  • SQL Catalog Updates: Implement update_table and register_table for SQL catalog
  • S3 Tables Catalog: Implement update_table for S3 Tables catalog
  • Enhanced Arrow Integration: Convert Arrow schema to Iceberg schema with auto-assigned field IDs, _file column support, and Date32 type support

Acceleration Snapshots

Acceleration snapshots enable point-in-time recovery and data versioning for accelerated datasets. Snapshots capture the state of accelerated data at specific points, allowing for fast bootstrap recovery and rollback capabilities.

Key Features:

  • Flexible Triggers: Configure when snapshots are created based on time intervals or stream batch counts
  • Automatic Compaction: Reduce storage overhead by compacting older snapshots (DuckDB only)
  • Bootstrap Integration: Snapshots can reset cache expiry on load for seamless recovery (DuckDB with Caching refresh mode)
  • Smart Creation Policies: Only create snapshots when data has actually changed

Example configuration:

datasets:
- from: s3://my-bucket/data.parquet
name: my_dataset
acceleration:
enabled: true
engine: cayenne
mode: file
snapshots: enabled
snapshots_trigger: time_interval
snapshots_trigger_threshold: 1h
snapshots_creation_policy: on_changed

Snapshots API and CLI: New API endpoints and CLI commands for managing snapshots programmatically.

CLI Commands:

# List all snapshots for a dataset
spice acceleration snapshots taxi_trips

# Get details of a specific snapshot
spice acceleration snapshot taxi_trips 3

# Set the current snapshot for rollback (requires runtime restart)
spice acceleration set-snapshot taxi_trips 2

HTTP API Endpoints:

MethodEndpointDescription
GET/v1/datasets/{dataset}/acceleration/snapshotsList all snapshots for a dataset
GET/v1/datasets/{dataset}/acceleration/snapshots/{id}Get details of a specific snapshot
POST/v1/datasets/{dataset}/acceleration/snapshots/currentSet the current snapshot for rollback

For more details, refer to the Acceleration Snapshots Documentation.

Caching Acceleration Mode Improvements

The Caching Acceleration Mode introduced in v1.10.0 has received significant performance optimizations and reliability fixes in this release.

Performance Optimizations:

  • Non-blocking Cache Writes: Cache misses no longer block query responses. Data is written to the cache asynchronously after the query returns, reducing query latency for cache miss scenarios.
  • Batch Cache Writes: Multiple cache entries are now written in batches rather than individually, significantly improving write throughput for high-volume cache operations.

Reliability Fixes:

  • Correct SWR Refresh Behavior: The stale-while-revalidate (SWR) pattern now correctly refreshes only the specific entries that were accessed instead of refreshing all stale rows in the dataset. This prevents unnecessary source queries and reduces load on upstream data sources.
  • Deduplicated Refresh Requests: Fixed an issue where JSON array responses could trigger multiple redundant refresh operations. Refresh requests are now properly deduplicated.
  • Fixed Cache Hit Detection: Resolved an issue where queries that didn't include fetched_at in their projection would always result in cache misses, even when cached data was available.
  • Unfiltered Query Optimization: SELECT * queries without filters now return cached data directly without unnecessary filtering overhead.

For more details, refer to the Caching Acceleration Mode Documentation.

Prepared Statements

Improved Query Performance and Security: Spice now supports prepared statements, enabling parameterized queries that improve both performance through query plan caching and security by preventing SQL injection attacks.

Key Features:

  • Query Plan Caching: Prepared statements cache query plans, reducing planning overhead for repeated queries
  • SQL Injection Prevention: Parameters are safely bound, preventing SQL injection vulnerabilities
  • Arrow Flight SQL Support: Full prepared statement support via Arrow Flight SQL protocol

SDK Support:

SDKSupportMin VersionMethod
gospice (Go)✅ Fullv8.0.0+SqlWithParams() with typed constructors (Int32Param, StringParam, TimestampParam, etc.)
spice-rs (Rust)✅ Fullv3.0.0+query_with_params() with RecordBatch parameters
spice-dotnet (.NET)✅ Fullv0.3.0+QueryWithParams() with typed parameter builders
spice-java (Java)✅ Fullv0.5.0+queryWithParams() with typed Param constructors (Param.int64(), Param.string(), etc.)
spice.js (JavaScript)✅ Fullv3.1.0+query() with parameterized query support
spicepy (Python)✅ Fullv3.1.0+query() with parameterized query support

Example (Go):

import "github.com/spiceai/gospice/v8"

client, _ := spice.NewClient()
defer client.Close()

// Parameterized query with typed parameters
results, _ := client.SqlWithParams(ctx,
"SELECT * FROM products WHERE price > $1 AND category = $2",
spice.Float64Param(10.0),
spice.StringParam("electronics"),
)

Example (Java):

import ai.spice.SpiceClient;
import ai.spice.Param;
import org.apache.arrow.adbc.core.ArrowReader;

try (SpiceClient client = new SpiceClient()) {
// With automatic type inference
ArrowReader reader = client.queryWithParams(
"SELECT * FROM products WHERE price > $1 AND category = $2",
10.0, "electronics");

// With explicit typed parameters
ArrowReader reader = client.queryWithParams(
"SELECT * FROM products WHERE price > $1 AND category = $2",
Param.float64(10.0),
Param.string("electronics"));
}

For more details, refer to the Parameterized Queries Documentation.

Spice Java SDK v0.5.0

Parameterized Query Support for Java: The Spice Java SDK v0.5.0 introduces parameterized queries using ADBC (Arrow Database Connectivity), providing a safer and more efficient way to execute queries with dynamic parameters.

Key Features:

  • SQL Injection Prevention: Parameters are safely bound, preventing SQL injection vulnerabilities
  • Automatic Type Inference: Java types are automatically mapped to Arrow types (e.g., doubleFloat64, StringUtf8)
  • Explicit Type Control: Use the new Param class with typed factory methods (Param.int64(), Param.string(), Param.decimal128(), etc.) for precise control over Arrow types
  • Updated Dependencies: Apache Arrow Flight SQL upgraded to 18.3.0, plus new ADBC driver support

Example:

import ai.spice.SpiceClient;
import ai.spice.Param;

try (SpiceClient client = new SpiceClient()) {
// With automatic type inference
ArrowReader reader = client.queryWithParams(
"SELECT * FROM taxi_trips WHERE trip_distance > $1 LIMIT 10",
5.0);

// With explicit typed parameters for precise control
ArrowReader reader = client.queryWithParams(
"SELECT * FROM orders WHERE order_id = $1 AND amount >= $2",
Param.int64(12345),
Param.decimal128(new BigDecimal("99.99"), 10, 2));
}

Maven:

<dependency>
<groupId>ai.spice</groupId>
<artifactId>spiceai</artifactId>
<version>0.5.0</version>
</dependency>

For more details, refer to the Spice Java SDK Repository.

Google LLM Support

Expanded AI Provider Support: Spice now supports Google embedding and chat models via the Google AI provider, expanding the available LLM options for AI inference workloads alongside existing providers like OpenAI, Anthropic, and AWS Bedrock.

Key Features:

  • Google Chat Models: Access Google's Gemini models for chat completions
  • Google Embeddings: Generate embeddings using Google's text embedding models
  • Unified API: Use the same OpenAI-compatible API endpoints for all LLM providers

Example spicepod.yaml configuration:

models:
- from: google:gemini-2.0-flash
name: gemini
params:
google_api_key: ${secrets:GOOGLE_API_KEY}

embeddings:
- from: google:text-embedding-004
name: google_embeddings
params:
google_api_key: ${secrets:GOOGLE_API_KEY}

For more details, refer to the Google LLM Documentation (see docs PR #1286).

URL Tables

Query data sources directly via URL in SQL without prior dataset registration. Supports S3, Azure Blob Storage, and HTTP/HTTPS URLs with automatic format detection and partition inference.

Supported Patterns:

  • Single files: SELECT * FROM 's3://bucket/data.parquet'
  • Directories/prefixes: SELECT * FROM 's3://bucket/data/'
  • Glob patterns: SELECT * FROM 's3://bucket/year=*/month=*/data.parquet'

Key Features:

  • Automatic file format detection (Parquet, CSV, JSON, etc.)
  • Hive-style partition inference with filter pushdown
  • Schema inference from files
  • Works with both SQL and DataFrame APIs

Example with hive partitioning:

-- Partitions are automatically inferred from paths
SELECT * FROM 's3://bucket/data/' WHERE year = '2024' AND month = '01'

Enable via spicepod.yml:

runtime:
params:
url_tables: enabled

Cluster Mode Async Query APIs (experimental)

New asynchronous query APIs for long-running queries in cluster mode:

  • /v1/queries endpoint: Submit queries and retrieve results asynchronously

OpenTelemetry Improvements

Unified Telemetry Endpoint: OTel metrics ingestion has been consolidated to the Flight port (50051), simplifying deployment by removing the separate OTel port (50052). The push-based metrics exporter continues to support integration with OpenTelemetry collectors.

Note: This is a breaking change. Update your configurations if you were using the dedicated OTel port 50052. Internal cluster communication now uses port 50052 exclusively.

Observability Improvements

Enhanced Dashboards: Updated Grafana and Datadog example dashboards with:

  • Snapshot monitoring widgets
  • Improved accelerated datasets section
  • Renamed ingestion lag charts for clarity

Additional Histogram Buckets: Added more buckets to histogram metrics for better latency distribution visibility.

For more details, refer to the Monitoring Documentation.

Hash Indexing for Arrow Acceleration (experimental)

Arrow-based accelerations now support hash indexing for faster point lookups on equality predicates. Hash indexes provide O(1) average-case lookup performance for columns with high cardinality.

Features:

  • Primary key hash index support
  • Secondary index support for non-primary key columns
  • Composite key support with proper null value handling

Example configuration:

datasets:
- from: postgres:users
name: users
acceleration:
enabled: true
engine: arrow
primary_key: user_id
indexes:
'(tenant_id, user_id)': unique # Composite hash index

For more details, refer to the Hash Index Documentation.

SMB and NFS Data Connectors

Network-Attached Storage Connectors: New data connectors for SMB (Server Message Block) and NFS (Network File System) protocols enable direct federated queries against network-attached storage without requiring data movement to cloud object stores.

Key Features:

  • SMB Protocol Support: Connect to Windows file shares and Samba servers with authentication support
  • NFS Protocol Support: Connect to Unix/Linux NFS exports for direct data access
  • Federated Queries: Query Parquet, CSV, JSON, and other file formats directly from network storage with full SQL support
  • Acceleration Support: Accelerate data from SMB/NFS sources using DuckDB, Spice Cayenne, or other accelerators

Example spicepod.yaml configuration:

datasets:
# SMB share
- from: smb://fileserver/share/data.parquet
name: smb_data
params:
smb_username: ${secrets:SMB_USER}
smb_password: ${secrets:SMB_PASS}

# NFS export
- from: nfs://nfsserver/export/data.parquet
name: nfs_data

For more details, refer to the Data Connectors Documentation.

ScyllaDB Data Connector

A new data connector for ScyllaDB, the high-performance NoSQL database compatible with Apache Cassandra. Query ScyllaDB tables directly or accelerate them for faster analytics.

Example configuration:

datasets:
- from: scylladb:my_keyspace.my_table
name: scylla_data
acceleration:
enabled: true
engine: duckdb

For more details, refer to the ScyllaDB Data Connector Documentation.

Flight SQL TLS Connection Fixes

TLS Connection Support: Fixed TLS connection issues when using grpc+tls:// scheme with Flight SQL endpoints. Added support for custom CA certificate files via the new flightsql_tls_ca_certificate_file parameter.

Developer Experience Improvements

  • Turso v0.3.2 Upgrade: Upgraded Turso accelerator for improved performance and reliability
  • Rust 1.91 Upgrade: Updated to Rust 1.91 for latest language features and performance improvements
  • Spice Cloud CLI: Added spice cloud CLI commands for cloud deployment management
  • Improved Spicepod Schema: Improved JSON schema generation for better IDE support and validation
  • Acceleration Snapshots: Added configurable snapshots_create_interval for periodic acceleration snapshots independent of refresh cycles
  • Tiered Caching with Localpod: The Localpod connector now supports caching refresh mode, enabling multi-layer acceleration where a persistent cache feeds a fast in-memory cache
  • GitHub Data Connector: Added workflows and workflow runs support for GitHub repositories
  • NDJSON/LDJSON Support: Added support for Newline Delimited JSON and Line Delimited JSON file formats

Additional Improvements & Bug Fixes

  • Model Listing: New functionality to list available models across multiple AI providers
  • DuckDB Partitioned Tables: Primary key constraints now supported in partitioned DuckDB table mode
  • Post-refresh Sorting: New on_refresh_sort_columns parameter for DuckDB enables data ordering after writes
  • Improved Install Scripts: Removed jq dependency and improved cross-platform compatibility
  • Better Error Messages: Improved error messaging for bucket UDF arguments and deprecated OpenAI parameters
  • Reliability: Fixed DynamoDB IAM role authentication with new dynamodb_auth: iam_role parameter
  • Reliability: Fixed cluster executors to use scheduler's temp_directory parameter for shuffle files
  • Reliability: Initialize secrets before object stores in cluster executor mode
  • Reliability: Added page-level retry with backoff for transient GitHub GraphQL errors
  • Performance: Improved statistics for rewritten DistributeFileScanOptimizer plans
  • Developer Experience: Added max_message_size configuration for Flight service

Contributors

Breaking Changes

OTel Ingestion Port Change

OTel ingestion has been moved to the Flight port (50051), removing the separate OTel port 50052. Port 50052 is now used exclusively for internal cluster communication. Update your configurations if you were using the dedicated OTel port.

Distributed Query Cluster Mode Requires mTLS

Distributed query cluster mode now requires mTLS for secure communication between cluster nodes. This is a security enhancement to prevent unauthorized nodes from joining the cluster and accessing secrets.

Migration Steps:

  1. Generate certificates using spice cluster tls init and spice cluster tls add
  2. Update scheduler and executor startup commands with --node-mtls-* arguments
  3. For development/testing, use --allow-insecure-connections to opt out of mTLS

Renamed CLI Arguments:

Old NameNew Name
--cluster-mode--role
--cluster-ca-certificate-file--node-mtls-ca-certificate-file
--cluster-certificate-file--node-mtls-certificate-file
--cluster-key-file--node-mtls-key-file
--cluster-address--node-bind-address
--cluster-advertise-address--node-advertise-address
--cluster-scheduler-url--scheduler-address

Removed CLI Arguments:

  • --cluster-api-key: Replaced by mTLS authentication

Cookbook Updates

New ScyllaDB Data Connector Recipe: New recipe demonstrating how to use the ScyllaDB Data Connector. See ScyllaDB Data Connector Recipe for details.

New SMB Data Connector Recipe: New recipe demonstrating how to use the SMB Data Connector. See SMB Data Connector Recipe for details.

The Spice Cookbook includes 86 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.11.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.11.0 image:

docker pull spiceai/spiceai:1.11.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai --version 1.11.0

AWS Marketplace:

Spice is available in the AWS Marketplace.

Dependencies

What's Changed

Changelog

Spice v1.11.0-rc.2 (Jan 22, 2026)

· 24 min read
Viktor Yershov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.11.0-rc.2!

v1.11.0-rc.2 is the second release candidate for advanced test of v1.11. It brings Spice Cayenne to Beta status with acceleration snapshots support, a new ScyllaDB Data Connector, upgrades to DataFusion v51, Arrow 57.2, and iceberg-rust v0.8.0. It includes significant improvements to distributed query, caching, and observability.

What's New in v1.11.0-rc.2

Spice Cayenne Accelerator Reaches Beta

Spice Cayenne has been promoted to Beta status with acceleration snapshots support and numerous stability improvements.

Improved Reliability:

  • Fixed timezone database issues in Docker images that caused acceleration panics
  • Resolved FuturesUnordered reentrant drop crashes
  • Fixed memory growth issues related to Vortex metrics allocation
  • Metadata catalog now properly respects cayenne_file_path location
  • Added warnings for unparseable configuration values

Example configuration with snapshots:

datasets:
- from: s3://my-bucket/data.parquet
name: my_dataset
acceleration:
enabled: true
engine: cayenne
mode: file

DataFusion v51 Upgrade

Apache DataFusion has been upgraded to v51, bringing significant performance improvements, new SQL features, and enhanced observability.

DataFusion v51 ClickBench Performance

Performance Improvements:

  • Faster CASE Expression Evaluation: Expressions now short-circuit earlier, reuse partial results, and avoid unnecessary scattering, speeding up common ETL patterns
  • Better Defaults for Remote Parquet Reads: DataFusion now fetches the last 512KB of Parquet files by default, typically avoiding 2 I/O requests per file
  • Faster Parquet Metadata Parsing: Leverages Arrow 57's new thrift metadata parser for up to 4x faster metadata parsing

New SQL Features:

  • SQL Pipe Operators: Support for |> syntax for inline transforms
  • DESCRIBE <query>: Returns the schema of any query without executing it
  • Named Arguments in SQL Functions: PostgreSQL-style param => value syntax for scalar, aggregate, and window functions
  • Decimal32/Decimal64 Support: New Arrow types supported including aggregations like SUM, AVG, and MIN/MAX

Example pipe operator:

SELECT * FROM t
|> WHERE a > 10
|> ORDER BY b
|> LIMIT 5;

Improved Observability:

  • Improved EXPLAIN ANALYZE Metrics: New metrics including output_bytes, selectivity for filters, reduction_factor for aggregates, and detailed timing breakdowns

Arrow 57.2 Upgrade

Spice has been upgraded to Apache Arrow Rust 57.2.0, bringing major performance improvements and new capabilities.

Arrow 57 Parquet Metadata Parsing Performance

Key Features:

  • 4x Faster Parquet Metadata Parsing: A rewritten thrift metadata parser delivers up to 4x faster metadata parsing, especially beneficial for low-latency use cases and files with large amounts of metadata
  • Parquet Variant Support: Experimental support for reading and writing the new Parquet Variant type for semi-structured data, including shredded variant values
  • Parquet Geometry Support: Read and write support for Parquet Geometry types (GEOMETRY and GEOGRAPHY) with GeospatialStatistics
  • New arrow-avro Crate: Efficient conversion between Apache Avro and Arrow RecordBatches with projection pushdown and vectorized execution support

iceberg-rust v0.8.0 Upgrade

Spice has been upgraded to iceberg-rust v0.8.0, bringing improved Iceberg table support.

Key Features:

  • V3 Metadata Support: Full support for Iceberg V3 table metadata format
  • INSERT INTO Partitioned Tables: DataFusion integration now supports inserting data into partitioned Iceberg tables
  • Improved Delete File Handling: Better support for position and equality delete files, including shared delete file loading and caching
  • SQL Catalog Updates: Implement update_table and register_table for SQL catalog
  • S3 Tables Catalog: Implement update_table for S3 Tables catalog
  • Enhanced Arrow Integration: Convert Arrow schema to Iceberg schema with auto-assigned field IDs, _file column support, and Date32 type support

Acceleration Snapshots

Acceleration snapshots enable point-in-time recovery and data versioning for accelerated datasets. Snapshots capture the state of accelerated data at specific points, allowing for fast bootstrap recovery and rollback capabilities.

Key Feature Improvements in v1.11:

  • Flexible Triggers: Configure when snapshots are created based on time intervals or stream batch counts
  • Automatic Compaction: Reduce storage overhead by compacting older snapshots (DuckDB only)
  • Bootstrap Integration: Snapshots can reset cache expiry on load for seamless recovery (DuckDB with Caching refresh mode)
  • Smart Creation Policies: Only create snapshots when data has actually changed

Example configuration:

datasets:
- from: s3://my-bucket/data.parquet
name: my_dataset
acceleration:
enabled: true
engine: cayenne
mode: file
snapshots: enabled
snapshots_trigger: time_interval
snapshots_trigger_threshold: 1h
snapshots_creation_policy: on_changed

Snapshots API and CLI: New API endpoints and CLI commands for managing snapshots programmatically. List, create, and restore snapshots directly from the command line or via HTTP.

For more details, refer to the Acceleration Snapshots Documentation.

ScyllaDB Data Connector

A new data connector for ScyllaDB, the high-performance NoSQL database compatible with Apache Cassandra. Query ScyllaDB tables directly or accelerate them for faster analytics.

Example configuration:

datasets:
- from: scylladb:my_keyspace.my_table
name: scylla_data
acceleration:
enabled: true
engine: duckdb

For more details, refer to the ScyllaDB Data Connector Documentation.

Distributed Query Improvements

mTLS Verification: Cluster communication between scheduler and executors now supports mutual TLS verification for enhanced security.

Credential Propagation: Azure and GCS credentials are now automatically propagated to executors in cluster mode, enabling access to cloud storage across the distributed query cluster.

Improved Resilience:

  • Exponential backoff for scheduler disconnection recovery
  • Increased gRPC message size limit from 16MB to 100MB for large query plans
  • HTTP health endpoint for cluster executors
  • Automatic executor role inference when --scheduler-address is provided

For more details, refer to the Distributed Query Documentation.

Caching Acceleration Mode Improvements

The Caching Acceleration Mode introduced in v1.10.0 has received significant performance optimizations and reliability fixes in this release.

Performance Optimizations:

  • Non-blocking Cache Writes: Cache misses no longer block query responses. Data is written to the cache asynchronously after the query returns, reducing query latency for cache miss scenarios.
  • Batch Cache Writes: Multiple cache entries are now written in batches rather than individually, significantly improving write throughput for high-volume cache operations.

Reliability Fixes:

  • Correct SWR Refresh Behavior: The stale-while-revalidate (SWR) pattern now correctly refreshes only the specific entries that were accessed instead of refreshing all stale rows in the dataset. This prevents unnecessary source queries and reduces load on upstream data sources.
  • Deduplicated Refresh Requests: Fixed an issue where JSON array responses could trigger multiple redundant refresh operations. Refresh requests are now properly deduplicated.
  • Fixed Cache Hit Detection: Resolved an issue where queries that didn't include fetched_at in their projection would always result in cache misses, even when cached data was available.
  • Unfiltered Query Optimization: SELECT * queries without filters now return cached data directly without unnecessary filtering overhead.

For more details, refer to the Caching Acceleration Mode Documentation.

DynamoDB Connector Enhancements

  • Added JSON nesting for DynamoDB Streams
  • Proper batch deletion handling

URL Tables

Query data sources directly via URL in SQL without prior dataset registration. Supports S3, Azure Blob Storage, and HTTP/HTTPS URLs with automatic format detection and partition inference.

Supported Patterns:

  • Single files: SELECT * FROM 's3://bucket/data.parquet'
  • Directories/prefixes: SELECT * FROM 's3://bucket/data/'
  • Glob patterns: SELECT * FROM 's3://bucket/year=*/month=*/data.parquet'

Key Features:

  • Automatic file format detection (Parquet, CSV, JSON, etc.)
  • Hive-style partition inference with filter pushdown
  • Schema inference from files
  • Works with both SQL and DataFrame APIs

Example with hive partitioning:

-- Partitions are automatically inferred from paths
SELECT * FROM 's3://bucket/data/' WHERE year = '2024' AND month = '01'

Enable via spicepod.yml:

runtime:
params:
url_tables: enabled

Cluster Mode Async Query APIs (experimental)

New asynchronous query APIs for long-running queries in cluster mode:

  • /v1/queries endpoint: Submit queries and retrieve results asynchronously
  • Arrow Flight async support: Non-blocking query execution via Arrow Flight protocol

Observability Improvements

Enhanced Dashboards: Updated Grafana and Datadog example dashboards with:

  • Snapshot monitoring widgets
  • Improved accelerated datasets section
  • Renamed ingestion lag charts for clarity

Additional Histogram Buckets: Added more buckets to histogram metrics for better latency distribution visibility.

For more details, refer to the Monitoring Documentation.

Additional Improvements

  • Model Listing: New functionality to list available models across multiple AI providers
  • DuckDB Partitioned Tables: Primary key constraints now supported in partitioned DuckDB table mode
  • Post-refresh Sorting: New on_refresh_sort_columns parameter for DuckDB enables data ordering after writes
  • Improved Install Scripts: Removed jq dependency and improved cross-platform compatibility
  • Better Error Messages: Improved error messaging for bucket UDF arguments and deprecated OpenAI parameters

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New ScyllaDB Data Connector Recipe: New recipe demonstrating how to use ScyllaDB Data Connector. See ScyllaDB Data Connector Recipe for details.

New SMB Data Connector Recipe: New recipe demonstrating how to use ScyllaDB Data Connector. See SMB Data Connector Recipe for details.

The Spice Cookbook includes 86 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.11.0-rc.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:v1.11.0-rc.2 image:

docker pull spiceai/spiceai:v1.11.0-rc.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

Spice is available in the AWS Marketplace.

Dependencies

Changelog

Spice v1.9.0 (Nov 19, 2025)

· 59 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.9.0-stable! 🌶

v1.9.0-stable introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations, and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0 also upgrades to DataFusion v50, DuckDB v1.4.2, and Delta-Kernel v0.16 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, and delivers many additional features and improvements.

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Introducing Cayenne: SQL as an Acceleration Format: A new high-performance Data Accelerator that simplifies multi-file data acceleration by using an embedded database (SQLite) for metadata while storing data in the Vortex columnar format, a Linux Foundation project. Cayenne delivers query and ingestion performance better than DuckDB's file-based acceleration without DuckDB's memory overhead and the scaling challenges of single DuckDB files.

Cayenne uses SQLite to manage acceleration metadata (schemas, snapshots, statistics, file tracking) through simple SQL transactions, while storing data in Vortex's compressed columnar format. This architecture provides:

Key Features:

  • SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
  • Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
  • Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
  • Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
  • High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
  • Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
- from: s3:my_table
name: accelerated_data_30d
acceleration:
enabled: true
engine: cayenne
mode: file
refresh_mode: append
retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Apache Ballista Integration: Spice now supports distributed query execution based on Apache Ballista, enabling distributed queries across multiple executor nodes for improved performance on large datasets. This feature is in preview in v1.9.0.

Architecture:

A distributed Spice cluster consists of:

  • Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
  • Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

  • Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
  • The feature is in preview and may have stability or performance limitations
  • Specific acceleration support is planned for future releases

For more details, refer to the Distributed Query Documentation.

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

  • Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.

  • Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

Apache Spark Compatible Functions: Added support for Spark-compatible functions including array, bit_get/bit_count, bitmap_count, crc32/sha1, date_add/date_sub, if, last_day, like/ilike, luhn_check, mod/pmod, next_day, parse_url, rint, and width_bucket.

Bug Fixes & Reliability: Resolved issues with partition name validation and empty execution plans when vector index lists are empty. Fixed timestamp support for partition expressions, enabling better partitioning for time-series data.

See the Apache DataFusion 50.0.3 Release for more details.

DuckDB v1.4.2 Upgrade and Accelerator Improvements

DuckDB v1.4.2: DuckDB has been upgraded to v1.4.2, which includes several performance optimizations.

Composite ART Index Support: DuckDB in Spice now supports composite (multi-column) Adaptive Radix Tree (ART) indexes for accelerated table scans. When queries filter on multiple columns fully covered by a composite index, the optimizer automatically uses index scans instead of full table scans, delivering significant performance improvements for selective queries.

Example configuration:

datasets:
- from: file://data.parquet
name: sales
acceleration:
enabled: true
engine: duckdb
indexes:
'(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

DuckDB Intermediate Materialization: Queries with indexes now use intermediate materialization (WITH ... AS MATERIALIZED) to leverage faster index scans. Currently supported for non-federated queries (query_federation: disabled) against a single table with indexes only. When predicates cover more columns than the index, the optimizer rewrites queries to first materialize index-filtered results, then apply remaining predicates. This optimization can deliver significant performance improvements for selective queries.

Example configuration:

datasets:
- from: file://sales_data.parquet
name: sales
acceleration:
enabled: true
engine: duckdb
mode: file
params:
query_federation: disabled # Required currently for intermediate materialization
indexes:
'(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
- from: s3://my_bucket/large_table/
name: partitioned_data
acceleration:
enabled: true
engine: duckdb
mode: file
retention:
sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

For more details, refer to the DuckDB Data Accelerator Documentation.

HTTP Data Connector

  • Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.

  • Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table

  • Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
- from: https://api.tvmaze.com
name: tvmaze
params:
file_format: json
max_retries: 3
client_timeout: 10s
allowed_request_paths: /search/people
request_query_filters: enabled
request_body_filters: enabled

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
- from: https://api.tvmaze.com
name: tvmaze
params:
file_format: json
allowed_request_paths: /search/people
request_query_filters: enabled
request_body_filters: enabled
acceleration:
enabled: true
refresh_mode: full
refresh_sql: |
SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people'
AND request_query IN ('q=michael', 'q=luke')

For more details, refer to the HTTP Data Connector Documentation.

DynamoDB Data Connector Improvements

Improved Query Performance: The DynamoDB Data Connector now includes improved filter handling for edge cases, parallel scan support for faster data ingestion, and better error handling for misconfigured queries. These improvements enable more reliable and performant access to DynamoDB data.

Example Spicepod.yml configuration:

datasets:
- from: dynamodb:my_table
name: ddb_data
params:
scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

For more details, refer to the DynamoDB Data Connector Documentation.

S3 Data Connector Improvements

S3 Versioning Support: Spice now supports S3 Versioning for all connectors using object-store (S3, Delta Lake, etc.), ensuring range reads over versioned files are atomically correct. When S3 versioning is enabled, Spice automatically tracks version IDs during file discovery and uses them for all subsequent range reads, preventing inconsistencies from concurrent file modifications.

Current limitations:

  • Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
  • Version tracking is automatic when S3 versioning is enabled on the bucket

S3 Single-File Refresh Skipping: Spice now optimizes S3 single-file dataset refreshes by caching file metadata (ETag, Version ID, size, timestamp) and skipping unnecessary data fetches when the underlying file hasn't changed. This optimization dramatically reduces bandwidth usage and improves refresh performance for scenarios where data doesn't change frequently. The feature is enabled by default for accelerated S3 single-file datasets and includes metrics tracking for skipped refreshes.

Example configuration:

datasets:
- from: s3://my-bucket/data.parquet
name: s3_data
acceleration:
enabled: true
engine: duckdb
refresh_check_interval: 10s

When the file's metadata hasn't changed between refresh checks, Spice will skip the data fetch entirely, logging:

Skipping refresh for dataset 's3_data': file metadata unchanged

For more details, refer to the S3 Data Connector Documentation.

Search & Embeddings Enhancements

Full-Text Search on Views: Full-text search indexes are now supported on views, enabling advanced search scenarios over pre-aggregated or transformed data. This extends the power of Spice's search capabilities beyond base datasets.

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
- name: aggregated_reviews
sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
embeddings:
- column: review_text
model: openai:text-embedding-3-small

For more details, refer to the Search Documentation and Embeddings Documentation.

Dedicated Query Thread Pool (Now Enabled by Default)

Dedicated Query Thread Pool: Query execution and accelerated refreshes now run on their own dedicated thread pool, separate from the HTTP server. This prevents heavy query workloads from slowing down API responses, keeping health checks fast and avoiding unnecessary Kubernetes pod restarts under load.

This feature was opt-in in previous releases and is now enabled by default. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
params:
dedicated_thread_pool: none

For more details, refer to the Runtime Configuration Documentation.

Query Performance Optimizations

Stale-While-Revalidate Cache Control: Query results now support "stale-while-revalidate" cache control, allowing stale cached data to be served immediately while asynchronously refreshing the cache entry in the background. This improves response times for frequently-accessed queries while maintaining data freshness. Requires cache key type to be set to "sql (raw)" for proper operation.

Optimized Prepared Statements: Prepared statement handling has been optimized for better performance with parameterized queries, reducing planning overhead and improving execution time for repeated queries.

Large RecordBatch Chunking: Large Arrow RecordBatch objects are now automatically chunked to control memory usage during query execution, preventing memory exhaustion for queries returning large result sets.

Query Result Caching: Compressed Encoding, Stale-While-Revalidate Cache Control

Zstd Compression Encoding: Query result caching now supports optional Zstandard (zstd) compression encoding to reduce memory usage for cached query results. This is particularly beneficial for large result sets, reducing cache memory footprint while maintaining fast decompression times. Encoding can be configured via the encoding parameter with options none (default) or zstd.

Example configuration:

runtime:
caching:
sql_results:
enabled: true
max_size: 128MiB
item_ttl: 1m
encoding: zstd # Enable zstd compression

HTTP Cache-Control Support: The query result cache now supports the stale-while-revalidate Cache-Control directive, enabling faster response times by serving stale cached results immediately while asynchronously refreshing the cache in the background. This feature is particularly useful for applications that can tolerate slightly stale data in exchange for improved performance.

Example configuration:

runtime:
caching:
sql_results:
enabled: true
max_size: 128MiB
item_ttl: 1m
stale_while_revalidate_ttl: 1m # serve stale items for up to 1 minute after `item_ttl` expires

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

  1. Immediately return the stale cached result to the client
  2. Asynchronously re-execute the query in the background to refresh the cache
  3. Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

  • Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
  • Background revalidation re-executes queries through the normal query path
  • Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

For more details, refer to the Results Caching Documentation.

Security & Reliability Improvements

Enhanced HTTP Client Security: HTTP client usage across the runtime has been hardened with improved TLS validation, certificate pinning for critical endpoints, and better error handling for network failures.

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Improved Credential Retry Logic: AWS SDK credential initialization has been significantly improved with more robust retry logic and better error handling. The system now automatically retries transient credential resolution failures using Fibonacci backoff, allowing Spice to tolerate extended AWS outages (up to ~48 hours) without manual intervention.

Key features:

  • Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
  • Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
  • Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
  • Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Version-Controlled Data Access: The new Git Data Connector (Alpha) enables querying datasets stored in Git repositories. This connector is ideal for use cases involving configuration files, documentation, or any data tracked in version control.

Example Spicepod.yml configuration:

datasets:
- from: git:https://github.com/myorg/myrepo
name: git_metrics
params:
file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK has been upgraded with support for configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
.withArrowMemoryLimitMB(1024) // 1GB limit
.build();

For more details, refer to the Java SDK Documentation.

CLI Improvements

Install Specific Versions: The spice install command now supports installing specific versions of the Spice runtime and CLI. This enables easy version management, downgrading, or installation of specific releases for testing or compatibility requirements.

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

Persistent Query History: The Spice CLI REPL (SQL, search, and chat interfaces) now persists command history to ~/.spice/query_history.txt, making your query history available across sessions. The history file is automatically created if it doesn't exist, with graceful fallback if the home directory cannot be determined.

New REPL Commands:

  • .clear - Clear the screen using ANSI escape codes for a clean workspace
  • .clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear # Clears the screen
sql> .clear history # Clears command history
sql> # Use arrow keys or tab to access previous commands

For more details, refer to the CLI Documentation.

Additional Improvements & Bug Fixes

  • Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
  • Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
  • Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
  • Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
  • Reliability: Fixed vector dimension determination for partitioned indexes.
  • Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
  • Search: Fixed search field handling as metadata for chunked search indexes.
  • Validation: Added timestamp support for partition expressions.
  • Validation: Fixed regexp_match function for DuckDB datasets.
  • Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0 image:

docker pull spiceai/spiceai:1.9.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

Changelog

Spice v1.9.0-rc.4 (Nov 18, 2025)

· 22 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.9.0-rc.4! 🌶

This release candidate brings DuckDB v1.4.2, Cayenne partitioning improvements, and comprehensive security hardening across the CLI, data connectors, runtime, and MCP. v1.9.0-rc.4 also includes MySQL and PostgreSQL connector improvements with fixed nullability inferences and full-text search support, DynamoDB consistency improvements, HTTP connector validation and UX enhancements, and numerous reliability and performance optimizations. Significant improvements were also made to test and automation infrastructure to ensure high quality releases.

v1.9.0 introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations, and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0 also upgrades to DataFusion v50 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, and delivers many additional features and improvements.

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Introducing Cayenne: SQL as an Acceleration Format: A new high-performance Data Accelerator that simplifies multi-file data acceleration by using an embedded database (SQLite) for metadata while storing data in the Vortex columnar format, a Linux Foundation project. Cayenne delivers query and ingestion performance better than DuckDB's file-based acceleration without DuckDB's memory overhead and the scaling challenges of single DuckDB files.

Cayenne uses SQLite to manage acceleration metadata (schemas, snapshots, statistics, file tracking) through simple SQL transactions, while storing data in Vortex's compressed columnar format. This architecture provides:

Key Features:

  • SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
  • Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
  • Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
  • Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
  • High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
  • Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
- from: s3:my_table
name: accelerated_data_30d
acceleration:
enabled: true
engine: cayenne
mode: file
refresh_mode: append
retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Apache Ballista Integration: Spice now supports distributed query execution based on Apache Ballista, enabling distributed queries across multiple executor nodes for improved performance on large datasets. This feature is in preview in v1.9.0-rc.3.

Architecture:

A distributed Spice cluster consists of:

  • Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
  • Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

  • Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
  • The feature is in preview and may have stability or performance limitations
  • Specific acceleration support is planned for future releases

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

  • Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.

  • Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

Apache Spark Compatible Functions: Added support for Spark-compatible functions including array, bit_get/bit_count, bitmap_count, crc32/sha1, date_add/date_sub, if, last_day, like/ilike, luhn_check, mod/pmod, next_day, parse_url, rint, and width_bucket.

Bug Fixes & Reliability: Resolved issues with partition name validation and empty execution plans when vector index lists are empty. Fixed timestamp support for partition expressions, enabling better partitioning for time-series data.

See the Apache DataFusion 50.0.3 Release for more details.

DuckDB v1.4.2 Upgrade and Accelerator Improvements

DuckDB v1.4.2: DuckDB has been upgraded to v1.4.2, which includes several performance optimizations.

Composite ART Index Support: DuckDB in Spice now supports composite (multi-column) Adaptive Radix Tree (ART) indexes for accelerated table scans. When queries filter on multiple columns fully covered by a composite index, the optimizer automatically uses index scans instead of full table scans, delivering significant performance improvements for selective queries.

Example configuration:

datasets:
- from: file://data.parquet
name: sales
acceleration:
enabled: true
engine: duckdb
indexes:
'(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

DuckDB Intermediate Materialization: Queries with indexes now use intermediate materialization (WITH ... AS MATERIALIZED) to leverage faster index scans. Currently supported for non-federated queries (query_federation: disabled) against a single table with indexes only. When predicates cover more columns than the index, the optimizer rewrites queries to first materialize index-filtered results, then apply remaining predicates. This optimization can deliver significant performance improvements for selective queries.

Example configuration:

datasets:
- from: file://sales_data.parquet
name: sales
acceleration:
enabled: true
engine: duckdb
mode: file
params:
query_federation: disabled # Required currently for intermediate materialization
indexes:
'(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
- from: s3://my_bucket/large_table/
name: partitioned_data
acceleration:
enabled: true
engine: duckdb
mode: file
retention:
sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

HTTP Data Connector

  • Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.

  • Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table

  • Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
- from: https://api.tvmaze.com
name: tvmaze
params:
file_format: json
max_retries: 3
client_timeout: 10s

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
- from: https://api.tvmaze.com
name: tvmaze
acceleration:
enabled: true
refresh_mode: full
refresh_sql: |
SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people'
AND request_query IN ('q=michael', 'q=luke')

DynamoDB Data Connector Improvements

Improved Query Performance: The DynamoDB Data Connector now includes improved filter handling for edge cases, parallel scan support for faster data ingestion, and better error handling for misconfigured queries. These improvements enable more reliable and performant access to DynamoDB data.

Example Spicepod.yml configuration:

datasets:
- from: dynamodb:my_table
name: ddb_data
params:
scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

S3 Versioning Support

Atomic Range Reads for Versioned Files: Spice now supports S3 Versioning for all connectors using object-store (S3, Delta Lake, etc.), ensuring range reads over versioned files are atomically correct. When S3 versioning is enabled, Spice automatically tracks version IDs during file discovery and uses them for all subsequent range reads, preventing inconsistencies from concurrent file modifications.

Current limitations:

  • Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
  • Version tracking is automatic when S3 versioning is enabled on the bucket

Search & Embeddings Enhancements

Full-Text Search on Views: Full-text search indexes are now supported on views, enabling advanced search scenarios over pre-aggregated or transformed data. This extends the power of Spice's search capabilities beyond base datasets.

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
- name: aggregated_reviews
sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
embeddings:
- column: review_text
model: openai:text-embedding-3-small

Dedicated Query Thread Pool (Now Enabled by Default)

Dedicated Query Thread Pool: Query execution and accelerated refreshes now run on their own dedicated thread pool, separate from the HTTP server. This prevents heavy query workloads from slowing down API responses, keeping health checks fast and avoiding unnecessary Kubernetes pod restarts under load.

This feature was opt-in in previous releases and is now enabled by default. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
params:
dedicated_thread_pool: none

Query Performance Optimizations

Stale-While-Revalidate Cache Control: Query results now support "stale-while-revalidate" cache control, allowing stale cached data to be served immediately while asynchronously refreshing the cache entry in the background. This improves response times for frequently-accessed queries while maintaining data freshness. Requires cache key type to be set to "sql (raw)" for proper operation.

Optimized Prepared Statements: Prepared statement handling has been optimized for better performance with parameterized queries, reducing planning overhead and improving execution time for repeated queries.

Large RecordBatch Chunking: Large Arrow RecordBatch objects are now automatically chunked to control memory usage during query execution, preventing memory exhaustion for queries returning large result sets.

Query Result Cache: Stale-While-Revalidate

HTTP Cache-Control Support: The query result cache now supports the stale-while-revalidate Cache-Control directive, enabling faster response times by serving stale cached results immediately while asynchronously refreshing the cache in the background. This feature is particularly useful for applications that can tolerate slightly stale data in exchange for improved performance.

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

  1. Immediately return the stale cached result to the client
  2. Asynchronously re-execute the query in the background to refresh the cache
  3. Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

  • Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
  • Background revalidation re-executes queries through the normal query path
  • Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

Security & Reliability Improvements

Enhanced HTTP Client Security: HTTP client usage across the runtime has been hardened with improved TLS validation, certificate pinning for critical endpoints, and better error handling for network failures.

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Improved Credential Retry Logic: AWS SDK credential initialization has been significantly improved with more robust retry logic and better error handling. The system now automatically retries transient credential resolution failures using Fibonacci backoff, allowing Spice to tolerate extended AWS outages (up to ~48 hours) without manual intervention.

Key features:

  • Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
  • Configurable retry limits: Supports up to 300 retry attempts with a maximum retry interval of 600 seconds
  • Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
  • Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
  • Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Version-Controlled Data Access: The new Git Data Connector (Alpha) enables querying datasets stored in Git repositories. This connector is ideal for use cases involving configuration files, documentation, or any data tracked in version control.

Example Spicepod.yml configuration:

datasets:
- from: git:https://github.com/myorg/myrepo
name: git_metrics
params:
file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK have been upgraded with support configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
.withArrowMemoryLimitMB(1024) // 1GB limit
.build();

CLI Improvements

Install Specific Versions: The spice install command now supports installing specific versions of the Spice runtime and CLI. This enables easy version management, downgrading, or installation of specific releases for testing or compatibility requirements.

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

Persistent Query History: The Spice CLI REPL (SQL, search, and chat interfaces) now persists command history to ~/.spice/query_history.txt, making your query history available across sessions. The history file is automatically created if it doesn't exist, with graceful fallback if the home directory cannot be determined.

New REPL Commands:

  • .clear - Clear the screen using ANSI escape codes for a clean workspace
  • .clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear # Clears the screen
sql> .clear history # Clears command history
sql> # Use arrow keys or tab to access previous commands

Additional Improvements & Bug Fixes

  • Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
  • Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
  • Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
  • Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
  • Reliability: Fixed vector dimension determination for partitioned indexes.
  • Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
  • Search: Fixed search field handling as metadata for chunked search indexes.
  • Validation: Added timestamp support for partition expressions.
  • Validation: Fixed regexp_match function for DuckDB datasets.
  • Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0-rc.4, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0-rc.4 image:

docker pull spiceai/spiceai:1.9.0-rc.4

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

Changelog (rc.4)

Spice v1.9.0-rc.2 (Nov 11, 2025)

· 32 min read
Sergei Grebnov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.9.0-rc.2! 🌶

This is the second release candidate for v1.9.0, which introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0-rc.2 also upgrades to DataFusion v50 and DuckDB v1.4.1 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, includes significant DynamoDB and DuckDB accelerator improvements, expands the HTTP data connector to support endpoints as tables, and delivers many security and reliability improvements.

What's New in v1.9.0-rc.2

Cayenne Data Accelerator (Beta)

Introducing Cayenne: SQL as an Acceleration Format: A new high-performance Data Accelerator that simplifies multi-file data acceleration by using an embedded database (SQLite) for metadata while storing data in the Vortex columnar format, a Linux Foundation project. Cayenne delivers query and ingestion performance better than DuckDB's file-based acceleration without DuckDB's memory overhead and the scaling challenges of single DuckDB files.

Cayenne uses SQLite to manage acceleration metadata (schemas, snapshots, statistics, file tracking) through simple SQL transactions, while storing data in Vortex's compressed columnar format. This architecture provides:

Key Features:

  • SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
  • Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
  • Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
  • Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
  • High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
  • Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
- from: s3:my_table
name: accelerated_data_30d
acceleration:
enabled: true
engine: cayenne
mode: file
refresh_mode: append
retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Apache Ballista Integration: Spice now supports distributed query execution based on Apache Ballista, enabling distributed queries across multiple executor nodes for improved performance on large datasets. This feature is in preview in v1.9.0-rc.2.

Architecture:

A distributed Spice cluster consists of:

  • Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
  • Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

  • Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
  • The feature is in preview and may have stability or performance limitations
  • Specific acceleration support is planned for future releases

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

  • Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.

  • Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

Apache Spark Compatible Functions: Added support for Spark-compatible functions including array, bit_get/bit_count, bitmap_count, crc32/sha1, date_add/date_sub, if, last_day, like/ilike, luhn_check, mod/pmod, next_day, parse_url, rint, and width_bucket.

Bug Fixes & Reliability: Resolved issues with partition name validation and empty execution plans when vector index lists are empty. Fixed timestamp support for partition expressions, enabling better partitioning for time-series data.

See the Apache DataFusion 50.0.0 Release for more details.

DuckDB v1.4.1 Upgrade and Accelerator Improvements

DuckDB v1.4.1: DuckDB has been upgraded to v1.4.1, which includes several performance optimizations.

Composite ART Index Support: DuckDB in Spice now supports composite (multi-column) Adaptive Radix Tree (ART) indexes for accelerated table scans. When queries filter on multiple columns fully covered by a composite index, the optimizer automatically uses index scans instead of full table scans, delivering significant performance improvements for selective queries.

Example configuration:

datasets:
- from: file://data.parquet
name: sales
acceleration:
enabled: true
engine: duckdb
indexes:
'(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

DuckDB Intermediate Materialization: Queries with indexes now use intermediate materialization (WITH ... AS MATERIALIZED) to leverage faster index scans. Currently supported for non-federated queries (query_federation: disabled) against a single table with indexes only. When predicates cover more columns than the index, the optimizer rewrites queries to first materialize index-filtered results, then apply remaining predicates. This optimization can deliver significant performance improvements for selective queries.

Example configuration:

datasets:
- from: file://sales_data.parquet
name: sales
acceleration:
enabled: true
engine: duckdb
mode: file
params:
query_federation: disabled # Required currently for intermediate materialization
indexes:
'(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
- from: s3://my_bucket/large_table/
name: partitioned_data
acceleration:
enabled: true
engine: duckdb
mode: file
retention:
sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

HTTP Data Connector

  • Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.

  • Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table

  • Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
- from: https://api.tvmaze.com
name: tvmaze
params:
file_format: json
max_retries: 3
client_timeout: 10s

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
- from: https://api.tvmaze.com
name: tvmaze
acceleration:
enabled: true
refresh_mode: full
refresh_sql: |
SELECT request_path, request_query, content
FROM tvmaze
request_path = '/search/people'
AND request_query IN ('q=michael', 'q=luke')

DynamoDB Data Connector Improvements

Improved Query Performance: The DynamoDB Data Connector now includes improved filter handling for edge cases, parallel scan support for faster data ingestion, and better error handling for misconfigured queries. These improvements enable more reliable and performant access to DynamoDB data.

Example Spicepod.yml configuration:

datasets:
- from: dynamodb:my_table
name: ddb_data
params:
scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

S3 Versioning Support

Atomic Range Reads for Versioned Files: Spice now supports S3 Versioning for all connectors using object-store (S3, Delta Lake, etc.), ensuring range reads over versioned files are atomically correct. When S3 versioning is enabled, Spice automatically tracks version IDs during file discovery and uses them for all subsequent range reads, preventing inconsistencies from concurrent file modifications.

Current limitations:

  • Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
  • Version tracking is automatic when S3 versioning is enabled on the bucket

Search & Embeddings Enhancements

Full-Text Search on Views: Full-text search indexes are now supported on views, enabling advanced search scenarios over pre-aggregated or transformed data. This extends the power of Spice's search capabilities beyond base datasets.

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
- name: aggregated_reviews
sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
embeddings:
- column: review_text
model: openai:text-embedding-3-small

Dedicated Query Thread Pool (Now Enabled by Default)

Dedicated Query Thread Pool: Query execution and accelerated refreshes now run on their own dedicated thread pool, separate from the HTTP server. This prevents heavy query workloads from slowing down API responses, keeping health checks fast and avoiding unnecessary Kubernetes pod restarts under load.

This feature was opt-in in previous releases and is now enabled by default in v1.9.0-rc.2. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
params:
dedicated_thread_pool: none

Query Performance Optimizations

Stale-While-Revalidate Cache Control: Query results now support "stale-while-revalidate" cache control, allowing stale cached data to be served immediately while asynchronously refreshing the cache entry in the background. This improves response times for frequently-accessed queries while maintaining data freshness. Requires cache key type to be set to "sql (raw)" for proper operation.

Optimized Prepared Statements: Prepared statement handling has been optimized for better performance with parameterized queries, reducing planning overhead and improving execution time for repeated queries.

Large RecordBatch Chunking: Large Arrow RecordBatch objects are now automatically chunked to control memory usage during query execution, preventing memory exhaustion for queries returning large result sets.

Query Result Cache: Stale-While-Revalidate

HTTP Cache-Control Support: The query result cache now supports the stale-while-revalidate Cache-Control directive, enabling faster response times by serving stale cached results immediately while asynchronously refreshing the cache in the background. This feature is particularly useful for applications that can tolerate slightly stale data in exchange for improved performance.

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

  1. Immediately return the stale cached result to the client
  2. Asynchronously re-execute the query in the background to refresh the cache
  3. Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

  • Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
  • Background revalidation re-executes queries through the normal query path
  • Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

Security & Reliability Improvements

Enhanced HTTP Client Security: HTTP client usage across the runtime has been hardened with improved TLS validation, certificate pinning for critical endpoints, and better error handling for network failures.

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Improved Credential Retry Logic: AWS SDK credential initialization has been significantly improved with more robust retry logic and better error handling. The system now automatically retries transient credential resolution failures using Fibonacci backoff, allowing Spice to tolerate extended AWS outages (up to ~48 hours) without manual intervention.

Key features:

  • Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
  • Configurable retry limits: Supports up to 300 retry attempts with a maximum retry interval of 600 seconds
  • Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
  • Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
  • Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Version-Controlled Data Access: The new Git Data Connector (Alpha) enables querying datasets stored in Git repositories. This connector is ideal for use cases involving configuration files, documentation, or any data tracked in version control.

Example Spicepod.yml configuration:

datasets:
- from: git:https://github.com/myorg/myrepo
name: git_metrics
params:
file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK have been upgraded with support configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
.withArrowMemoryLimitMB(1024) // 1GB limit
.build();

CLI Improvements

Install Specific Versions: The spice install command now supports installing specific versions of the Spice runtime and CLI. This enables easy version management, downgrading, or installation of specific releases for testing or compatibility requirements.

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

Persistent Query History: The Spice CLI REPL (SQL, search, and chat interfaces) now persists command history to ~/.spice/query_history.txt, making your query history available across sessions. The history file is automatically created if it doesn't exist, with graceful fallback if the home directory cannot be determined.

New REPL Commands:

  • .clear - Clear the screen using ANSI escape codes for a clean workspace
  • .clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear # Clears the screen
sql> .clear history # Clears command history
sql> # Use arrow keys or tab to access previous commands

Additional Improvements & Bug Fixes

  • Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
  • Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
  • Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
  • Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
  • Reliability: Fixed vector dimension determination for partitioned indexes.
  • Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
  • Search: Fixed search field handling as metadata for chunked search indexes.
  • Validation: Added timestamp support for partition expressions.
  • Validation: Fixed regexp_match function for DuckDB datasets.
  • Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0-rc.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0-rc.2 image:

docker pull spiceai/spiceai:1.9.0-rc.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

Changelog

Spice v1.9.0-rc.1 (Nov 4, 2025)

· 16 min read
William Croxson
Senior Software Engineer at Spice AI

This is the first release candidate for v1.9.0, which introduces Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers DuckDB-comparable performance without scaling limitations. This release also upgrades to DataFusion v50 for improved query performance, expands search capabilities with full-text search on views and multi-column embeddings, includes significant DynamoDB and DuckDB accelerator improvements, and delivers security and reliability enhancements.

What's New in v1.9.0-rc.1

Cayenne Data Accelerator (Alpha)

Introducing Cayenne: SQL as an Acceleration Format: A new high-performance data accelerator that simplifies multi-file data acceleration by using an embedded database (SQLite) for metadata while storing data in the Vortex columnar format. Cayenne delivers query and ingestion performance comparable or better to DuckDB's file-based acceleration without DuckDB's memory overhead and the scaling challenges of single DuckDB files.

Cayenne uses SQLite to manage acceleration metadata (schemas, snapshots, statistics, file tracking) through simple SQL transactions, while storing actual data in Vortex's compressed columnar format. This architecture provides:

Key Features:

  • SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
  • Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
  • Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
  • Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
  • High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
  • Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
- from: s3:my_table
name: accelerated_data
acceleration:
enabled: true
engine: cayenne
retention:
sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Alpha with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

DataFusion v50 Upgrade

Spice.ai is built on the DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

  • Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
  • Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

Bug Fixes & Reliability: Resolved issues with partition name validation and empty execution plans when vector index lists are empty. Fixed timestamp support for partition expressions, enabling better partitioning for time-series data.

See the Apache DataFusion 50.0.0 Release for more details.

DynamoDB Data Connector Improvements

Improved Query Performance: The DynamoDB Data Connector now includes improved filter handling for edge cases, parallel scan support for faster data ingestion, and better error handling for misconfigured queries. These improvements enable more reliable and performant access to DynamoDB data.

Example Spicepod.yml configuration:

datasets:
- from: dynamodb:my_table
name: ddb_data
params:
scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

Search & Embeddings Enhancements

Full-Text Search on Views: Full-text search indexes are now supported on views, enabling advanced search scenarios over pre-aggregated or transformed data. This extends the power of Spice's search capabilities beyond base datasets.

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
- name: aggregated_reviews
sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
embeddings:
- column: review_text
model: openai:text-embedding-3-small

DuckDB Accelerator Improvements

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
- from: s3://my_bucket/large_table/
name: partitioned_data
acceleration:
enabled: true
engine: duckdb
mode: file
retention:
sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

Query Performance Optimizations

Optimized Prepared Statements: Prepared statement handling has been optimized for better performance with parameterized queries, reducing planning overhead and improving execution time for repeated queries.

Large RecordBatch Chunking: Large Arrow RecordBatch objects are now automatically chunked to control memory usage during query execution, preventing memory exhaustion for queries returning large result sets.

Security & Reliability Improvements

Enhanced HTTP Client Security: HTTP client usage across the runtime has been hardened with improved TLS validation, certificate pinning for critical endpoints, and better error handling for network failures.

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Version-Controlled Data Access: The new Git Data Connector (Alpha) enables querying datasets stored in Git repositories. This connector is ideal for use cases involving configuration files, documentation, or any data tracked in version control.

Example Spicepod.yml configuration:

datasets:
- from: git:https://github.com/myorg/myrepo
name: git_metrics
params:
file_format: csv

For more details, refer to the Git Data Connector Documentation.

Additional Improvements & Bug Fixes

  • Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
  • Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
  • Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
  • Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
  • Reliability: Fixed vector dimension determination for partitioned indexes.
  • Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
  • Search: Fixed search field handling as metadata for chunked search indexes.
  • Validation: Added timestamp support for partition expressions.
  • Validation: Fixed regexp_match function for DuckDB datasets.
  • Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No major cookbook updates.

The Spice Cookbook includes 81 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0-rc.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0-rc.1 image:

docker pull spiceai/spiceai:1.9.0-rc.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Changelog

Spice v1.7.0 (Sep 23, 2025)

· 21 min read
Sergei Grebnov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.7.0! ⚡

Spice v1.7.0 upgrades to DataFusion v49 for improved performance and query optimization, introduces real-time full-text search indexing for CDC streams, EmbeddingGemma support for high-quality embeddings, new search table functions powering the /v1/search API, embedding request caching for faster and cost-efficient search and indexing, and OpenAI Responses API tool calls with streaming. This release also includes numerous bug fixes across CDC streams, vector search, the Kafka Data Connector, and error reporting.

What's New in v1.7.0

DataFusion v49 Highlights

DataFusion Clickbench Performance Graph Source: DataFusion 49.0.0 Release Blog.

Performance Improvements 🚀

  • Equivalence System Upgrade: Faster planning for queries with many columns, enabling more sophisticated sort-based optimizations.
  • Dynamic Filters & TopK Pushdown: Queries with ORDER BY and LIMIT now use dynamic filters and physical filter pushdown, skipping unnecessary data reads for much faster top-k queries.
  • Compressed Spill Files: Intermediate files written during sort/group spill to disk are now compressed, reducing disk usage and improving performance.
  • WITHIN GROUP for Ordered-Set Aggregates: Support for ordered-set aggregate functions (e.g., percentile_disc) with WITHIN GROUP.
  • REGEXP_INSTR Function: Find regex match positions in strings.

Spice Runtime Highlights

EmbeddingGemma Support: Spice now supports EmbeddingGemma, Google's state-of-the-art embedding model for text and documents. EmbeddingGemma provides high-quality, efficient embeddings for semantic search, retrieval, and recommendation tasks. You can use EmbeddingGemma via HuggingFace in your Spicepod configuration:

Example spicepod.yml snippet:

embeddings:
- from: huggingface:huggingface.co/google/embeddinggemma-300m
name: embeddinggemma
params:
hf_token: ${secrets:HUGGINGFACE_TOKEN}

Learn more about EmbeddingGemma in the official documentation.

POST /v1/search API Use Search Table Functions: The /v1/search API now uses the new text_search and vector_search Table Functions for improved performance.

Embedding Request Caching: The runtime now supports caching embedding requests, reducing latency and cost for repeated content and search requests.

Example spicepod.yml snippet:

runtime:
caching:
embeddings:
enabled: true
max_size: 128mb
item_ttl: 5s

See the Caching documentation for details.

Real-Time Indexing for Full Text Search: Full Text search indexing is now supported for connectors that enable real-time changes, such as Debezium CDC streams. Adding a full-text index on a column with refresh_mode: changes works as it does for full/append-mode refreshes, enabling instant search on new data.

Example spicepod.yml snippet:

datasets:
- from: debezium:cdc.public.question
name: questions
acceleration:
enabled: true
engine: duckdb
primary_key: id
refresh_mode: changes # Use 'changes'
params: *kafka_params
columns:
- name: title
full_text_search:
enabled: true # Enable full-text-search indexing
row_id:
- id

OpenAI Responses API Tool Calls with Streaming: The OpenAI Responses API now supports tool calls with streaming, enabling advanced model interactions such as web_search and code_interpreter with real-time response streaming. This allows you to invoke OpenAI-hosted tools and receive results as they are generated.

Learn more in the OpenAI Model Provider documentation.

Runtime Output Level Configuration: You can now set the output_level parameter in the Spicepod runtime configuration to control logging verbosity in addition to the existing CLI and environment variable support. Supported values are info, verbose, and very_verbose. The value is applied in the following priority: CLI, environment variables, then YAML configuration.

Example spicepod.yml snippet:

runtime:
output_level: info # or verbose, very_verbose

For more details on configuring output level, see the Troubleshooting documentation.

Bug Fixes

Several bugs and issues have been resolved in this release, including:

  • CDC Streams: Fixed issues where refresh_mode: changes could prevent the Spice runtime from becoming Ready, and improved support for full-text indexing on CDC streams.
  • Vector Search: Fixed bugs where vector search HTTP pipeline could not find more than one IndexedTableProvider, and resolved errors with field mismatches in vector_search UDTF.
  • Kafka Integration: Improved Kafka schema inference with configurable sample size, improved consumer group persistence for SQLite and Postgres accelerations, and added cooperative mode support.
  • Perplexity Web Search: Fixed bug where Perplexity web search sometimes used incorrect query schema (limit).
  • Databricks: Fixed issue with unparsing embedded columns.
  • Error Reporting: ThrottlingException is now reported correctly instead of as InternalError.
  • Iceberg Data Connector: Added support for LIMIT pushdown.
  • Amazon S3 Vectors: Fixed ingestion issues with zero-vectors and improved handling when vector index is full.
  • Tracing: Fixed vector search tracing to correctly report SQL status.

Contributors

New Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

The Spice Cookbook includes 78 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.7.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.7.0 image:

docker pull spiceai/spiceai:1.7.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

Changelog

Spice v1.4.0 (June 18, 2025)

· 19 min read
William Croxson
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.4.0! ⚡

This release upgrades DataFusion to v47 and Arrow to v55 for faster queries, more efficient Parquet/CSV handling, and improved reliability. It introduces the AWS Glue Catalog and Data Connectors for native access to Glue-managed data on S3, and adds support for Databricks U2M OAuth for secure Databricks user authentication.

New Cron-based dataset refreshes and worker schedules enable automated task management, while dataset and search results caching improvements further optimizes query, search, and RAG performance.

What's New in v1.4.0

DataFusion v47 Highlights

Spice.ai is built on the DataFusion query engine. The v47 release brings:

Performance Improvements 🚀: This release delivers major query speedups through specialized GroupsAccumulator implementations for first_value, last_value, and min/max on Duration types, eliminating unnecessary sorting and computation. TopK operations are now up to 10x faster thanks to early exit optimizations, while sort performance is further enhanced by reusing row converters, removing redundant clones, and optimizing sort-preserving merge streams. Logical operations benefit from short-circuit evaluation for AND/OR, reducing overhead, and additional enhancements address high latency from sequential metadata fetching, improve int/string comparison efficiency, and simplify logical expressions for better execution.

Bug Fixes & Compatibility Improvements 🛠️: The release addresses issues with external sort, aggregation, and window functions, improves handling of NULL values and type casting in arrays and binary operations, and corrects problems with complex joins and nested window expressions. It also addresses SQL unparsing for subqueries, aliases, and UNION BY NAME.

See the Apache DataFusion 47.0.0 Changelog for details.

Arrow v55 Highlights

Arrow v55 delivers faster Parquet gzip compression, improved array concatenation, and better support for large files (4GB+) and modular encryption. Parquet metadata reads are now more efficient, with support for range requests and enhanced compatibility for INT96 timestamps and timezones. CSV parsing is more robust, with clearer error messages. These updates boost performance, compatibility, and reliability.

See the Arrow 55.0.0 Changelog and Arrow 55.1.0 Changelog for details.

Runtime Highlights

Search Result Caching: Spice now supports runtime caching for search results, improving performance for subsequent searches and chat completion requests that use the document_similarity LLM tool. Caching is configurable with options like maximum size, item TTL, eviction policy, and hashing algorithm.

Example spicepod.yml configuration:

runtime:
caching:
search_results:
enabled: true
max_size: 128mb
item_ttl: 5s
eviction_policy: lru
hashing_algorithm: siphash

For more information, refer to the Caching documentation.

AWS Glue Catalog Connector Alpha: Connect to AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV tables in S3.

Example spicepod.yml configuration:

catalogs:
- from: glue
name: my_glue_catalog
params:
glue_key: <your-access-key-id>
glue_secret: <your-secret-access-key>
glue_region: <your-region>
include:
- 'testdb.hive_*'
- 'testdb.iceberg_*'
sql> show tables;
+-----------------+--------------+-------------------+------------+
| table_catalog | table_schema | table_name | table_type |
+-----------------+--------------+-------------------+------------+
| my_glue_catalog | testdb | hive_table_001 | BASE TABLE |
| my_glue_catalog | testdb | iceberg_table_001 | BASE TABLE |
| spice | runtime | task_history | BASE TABLE |
+-----------------+--------------+-------------------+------------+

For more information, refer to the Glue Catalog Connector documentation.

AWS Glue Data Connector Alpha: Connect to specific tables in AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV in S3.

Example spicepod.yml configuration:

datasets:
- from: glue:my_database.my_table
name: my_table
params:
glue_auth: key
glue_region: us-east-1
glue_key: ${secrets:AWS_ACCESS_KEY_ID}
glue_secret: ${secrets:AWS_SECRET_ACCESS_KEY}

For more information, refer to the Glue Data Connector documentation.

Databricks U2M OAuth: Spice now supports User-to-Machine (U2M) authentication for Databricks when called with a compatible client, such as the Spice Cloud Platform.

datasets:
- from: databricks:spiceai_sandbox.default.messages
name: messages
params:
databricks_endpoint: ${secrets:DATABRICKS_ENDPOINT}
databricks_cluster_id: ${secrets:DATABRICKS_CLUSTER_ID}
databricks_client_id: ${secrets:DATABRICKS_CLIENT_ID}

Dataset Refresh Schedules: Accelerated datasets now support a refresh_cron parameter, automatically refreshing the dataset on a defined cron schedule. Cron scheduled refreshes respect the global dataset_refresh_parallelism parameter.

Example spicepod.yml configuration:

datasets:
- name: my_dataset
from: s3://my-bucket/my_file.parquet
acceleration:
refresh_cron: 0 0 * * * # Daily refresh at midnight

For more information, refer to the Dataset Refresh Schedules documentation.

Worker Execution Schedules: Workers now support a cron parameter and will execute an LLM-prompt or SQL query automatically on the defined cron schedule, in conjunction with a provided params.prompt.

Example spicepod.yml configuration:

workers:
- name: email_reporter
models:
- from: gpt-4o
params:
prompt: 'Inspect the latest emails, and generate a summary report for them. Post the summary report to the connected Teams channel'
cron: 0 2 * * * # Daily at 2am

For more information, refer to the Worker Execution Schedules documentation.

SQL Worker Actions: Spice now supports workers with sql actions for automated SQL query execution on a cron schedule:

workers:
- name: my_worker
cron: 0 * * * *
sql: 'SELECT * FROM lineitem'

For more information, refer to the Workers with a SQL action documentation;

Contributors

Breaking Changes

  • No breaking changes.

Cookbook Updates

The Spice Cookbook now includes 70 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.4.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.4.0 image:

docker pull spiceai/spiceai:1.4.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

Changelog

  • Update trunk to 1.4.0-unstable (#5878) by @phillipleblanc in #5878
  • Update openapi.json (#5885) by @app/github-actions in #5885
  • feat: Testoperator reports benchmark failure summary (#5889) by @peasee in #5889
  • fix: Publish binaries to dev when platform option is all (#5905) by @peasee in #5905
  • feat: Print dispatch current test count of total (#5906) by @peasee in #5906
  • Include multiple duckdb files acceleration scenarios into testoperator dispatch (#5913) by @sgrebnov in #5913
  • feat: Support building testoperator on dev (#5915) by @peasee in #5915
  • Update spicepod.schema.json (#5927) by @app/github-actions in #5927
  • Update ROADMAP & SECURITY for 1.3.0 (#5926) by @phillipleblanc in #5926
  • docs: Update qa_analytics.csv (#5928) by @peasee in #5928
  • fix: Properly publish binaries to dev on push (#5931) by @peasee in #5931
  • Load request context extensions on every flight incoming call (#5916) by @ewgenius in #5916
  • Fix deferred loading for datasets with embeddings (#5932) by @ewgenius in #5932
  • Schedule AI benchmarks to run every Mon and Thu evening PST (#5940) by @sgrebnov in #5940
  • Fix explain plan snapshots for TPCDS queries Q36, Q70 & Q86 not being deterministic after DF 46 upgrade (#5942) by @phillipleblanc in #5942
  • chore: Upgrade to Rust 1.86 (#5945) by @peasee in #5945
  • Standardise HTTP settings across CLI (#5769) by @Jeadie in #5769
  • Fix deferred flag for Databricks SQL warehouse mode (#5958) by @ewgenius in #5958
  • Add deferred catalog loading (#5950) by @ewgenius in #5950
  • Refactor deferred_load using ComponentInitialization enum for better clarity (#5961) by @ewgenius in #5961
  • Post-release housekeeping (#5964) by @phillipleblanc in #5964
  • add LTO for release builds (#5709) by @kczimm in #5709
  • Fix dependabot/192 (#5976) by @Jeadie in #5976
  • Fix Test-to-SQL benchmark scheduled run (#5977) by @sgrebnov in #5977
  • Fix JSON to ScalarValue type conversion to match DataFusion behavior (#5979) by @sgrebnov in #5979
  • Add v1.3.1 release notes (#5978) by @lukekim in #5978
  • Regenerate nightly build workflow (#5995) by @ewgenius in #5995
  • Fix DataFusion dependency loading in Databricks request context extension (#5987) by @ewgenius in #5987
  • Update spicepod.schema.json (#6000) by @app/github-actions in #6000
  • feat: Run MySQL SF100 on dev runners (#5986) by @peasee in #5986
  • fix: Remove caching RwLock (#6001) by @peasee in #6001
  • 1.3.1 Post-release housekeeping (#6002) by @phillipleblanc in #6002
  • feat: Add initial scheduler crate (#5923) by @peasee in #5923
  • fix flight request context scope (#6004) by @ewgenius in #6004
  • fix: Ensure snapshots on different scale factors are retained (#6009) by @peasee in #6009
  • fix: Allow dev runners in dispatch files (#6011) by @peasee in #6011
  • refactor: Deprecate results_cache for caching.sql_results (#6008) by @peasee in #6008
  • Fix models benchmark results reporting (#6013) by @sgrebnov in #6013
  • fix: Run PR checks for tools/ changes (#6014) by @peasee in #6014
  • feat: Add a CronRequestChannel for scheduler (#6005) by @peasee in #6005
  • feat: Add refresh_cron acceleration parameter, start scheduler on table load (#6016) by @peasee in #6016
  • Update license check to allow dual license crates (#6021) by @sgrebnov in #6021
  • Initial worker concept (#5973) by @Jeadie in #5973
  • Don't fail if cargo-deny already installed (license check) (#6023) by @sgrebnov in #6023
  • Upgrade to DataFusion 47 and Arrow 55 (#5966) by @sgrebnov in #5966
  • Read Iceberg tables from Glue Catalog Connector (#5965) by @kczimm in #5965
  • Handle multiple highlights in v1/search UX (#5963) by @Jeadie in #5963
  • feat: Add cron scheduler configurations for workers (#6033) by @peasee in #6033
  • feat: Add search cache configuration and results wrapper (#6020) by @peasee in #6020
  • Fix GitHub Actions Ubuntu for more workflows (#6040) by @phillipleblanc in #6040
  • Fix Actions for testoperator dispatch manual (#6042) by @phillipleblanc in #6042
  • refactor: Remove worker type (#6039) by @peasee in #6039
  • feat: Support cron dataset refreshes (#6037) by @peasee in #6037
  • Upgrade datafusion-federation to 0.4.2 (#6022) by @phillipleblanc in #6022
  • Define SearchPipeline and use in runtime/vector_search.rs. (#6044) by @Jeadie in #6044
  • fix: Scheduler test when scheduler is running (#6051) by @peasee in #6051
  • doc: Spice Cloud Connector Limitation (#6035) by @Sevenannn in #6035
  • Add support for on_conflict:upsert for Arrow MemTable (#6059) by @sgrebnov in #6059
  • Enhance Arrow Flight DoPut operation tracing (#6053) by @sgrebnov in #6053
  • Update openapi.json (#6032) by @app/github-actions in #6032
  • Add tools enabled to MCP server capabilities (#6060) by @Jeadie in #6060
  • Upgrade to delta_kernel 0.11 (#6045) by @phillipleblanc in #6045
  • refactor: Replace refresh oneshot with notify (#6050) by @peasee in #6050
  • Enable Upsert OnConflictBehavior for runtime.task_history table (#6068) by @sgrebnov in #6068
  • feat: Add a workers integration test (#6069) by @peasee in #6069
  • Fix DuckDB acceleration ORDER BY rand() and ORDER BY NULL (#6071) by @phillipleblanc in #6071
  • Update Models Benchmarks to report unsuccessful evals as errors (#6070) by @sgrebnov in #6070
  • Revert: fix: Use HTTPS ubuntu sources (#6082) by @Sevenannn in #6082
  • Add initial support for Spice Cloud Platform management (#6089) by @sgrebnov in #6089
  • Run spiceai cloud connector TPC tests using spice dev apps (#6049) by @Sevenannn in #6049
  • feat: Add SQL worker action (#6093) by @peasee in #6093
  • Post-release housekeeping (#6097) by @phillipleblanc in #6097
  • Fix search bench (#6091) by @Jeadie in #6091
  • fix: Update benchmark snapshots (#6094) by @app/github-actions in #6094
  • fix: Update benchmark snapshots (#6095) by @app/github-actions in #6095
  • Glue catalog connector for hive style parquet (#6054) by @kczimm in #6054
  • Update openapi.json (#6100) by @app/github-actions in #6100
  • Improve Flight Client DoPut / Publish error handling (#6105) by @sgrebnov in #6105
  • Define PostApplyCandidateGeneration to handle all filters & projections. (#6096) by @Jeadie in #6096
  • refactor: Update the tracing task names for scheduled tasks (#6101) by @peasee in #6101
  • task: Switch GH runners in PR and testoperator (#6052) by @peasee in #6052
  • feat: Connect search caching for HTTP and tools (#6108) by @peasee in #6108
  • test: Add multi-dataset cron test (#6102) by @peasee in #6102
  • Sanitize the ListingTableURL (#6110) by @phillipleblanc in #6110
  • Avoid partial writes by FlightTableWriter (#6104) by @sgrebnov in #6104
  • fix: Update the TPCDS postgres acceleration indexes (#6111) by @peasee in #6111
  • Make Glue Catalog refreshable (#6103) by @kczimm in #6103
  • Refactor Glue catalog to use a new Glue data connector (#6125) by @kczimm in #6125
  • Emit retry error on flight transient connection failure (#6123) by @Sevenannn in #6123
  • Update Flight DoPut implementation to send single final PutResult (#6124) by @sgrebnov in #6124
  • feat: Add metrics for search results cache (#6129) by @peasee in #6129
  • update MCP crate (#6130) by @Jeadie in #6130
  • feat: Add search cache status header, respect cache control (#6131) by @peasee in #6131
  • fix: Allow specifying individual caching blocks (#6133) by @peasee in #6133
  • Update openapi.json (#6132) by @app/github-actions in #6132
  • Add CSV support to Glue data connector (#6138) by @kczimm in #6138
  • Update Spice Cloud Platform management UX (#6140) by @sgrebnov in #6140
  • Add TPCH bench for Glue catalog (#6055) by @kczimm in #6055
  • Enforce max_tokens_per_request limit in OpenAI embedding logic (#6144) by @sgrebnov in #6144
  • Enable Spice Cloud Control Plane connect (management) for FinanceBench (#6147) by @sgrebnov in #6147
  • Add integration test for Spice Cloud Platform management (#6150) by @sgrebnov in #6150
  • fix: Invalidate search cache on refresh (#6137) by @peasee in #6137
  • fix: Prevent registering cron schedule with change stream accelerations (#6152) by @peasee in #6152
  • test: Add an append cron integration test (#6151) by @peasee in #6151
  • fix: Cache search results with no-cache directive (#6155) by @peasee in #6155
  • fix: Glue catalog dispatch runner type (#6157) by @peasee in #6157
  • Fix: Glue S3 location for directories and Iceberg credentials (#6174) by @kczimm in #6174
  • Support multiple columns in FTS (#6156) by @Jeadie in #6156
  • fix: Add --cache-control flag for search CLI (#6158) by @peasee in #6158
  • Add Glue data connector tpch bench test for parquet and csv (#6170) by @kczimm in #6170
  • fix: Apply results cache deprecation correctly (#6177) by @peasee in #6177
  • Fix regression in Parquet pushdown (#6178) by @phillipleblanc in #6178
  • Fix CUDA build (use candle-core 0.8.4 and cudarc v0.12) (#6181) by @sgrebnov in #6181
  • return empty stream if no external_links present (#6192) by @kczimm in #6192
  • Use arrow pretty print util instead of init dataframe / logical plan in display_records (#6191) by @Sevenannn in #6191
  • task: Enable additional TPCDS test scenarios in dispatcher (#6160) by @peasee in #6160
  • chore: Update dependencies (#6196) by @peasee in #6196
  • Fix FlightSQL GetDbSchemas and GetTables schemas to fully match the protocol (#6197) by @sgrebnov in #6197
  • Use spice-rs in test operator and retry on connection reset error (#6136) by @Sevenannn in #6136
  • Fix load status metric description (#6219) by @phillipleblanc in #6219
  • Run extended tests on PRs against release branch, update glue_iceberg_integration_test_catalog test (#6204) by @Sevenannn in #6204
  • query schema for is_nullable (#6229) by @kczimm in #6229
  • fix: use the query error message when queries fail (#6228) by @kczimm in #6228
  • fix glue iceberg catalog integration test (#6249) by @Sevenannn in #6249
  • cache table providers in glue catalog (#6252) by @kczimm in #6252
  • fix: databricks sql_warehouse schema contains duplicate fields (#6255) by @phillipleblanc in #6255

Full Changelog: v1.3.2...v1.4.0

Spice v1.3.0 (May 19, 2025)

· 9 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.3.0! 🏎️

Spice v1.3.0 accelerates data and AI applications with significantly improved query performance, reliability, and expanded Databricks integration. New support for the Databricks SQL Statement Execution API enables direct SQL queries on Databricks SQL Warehouses, complementing Mosaic AI model serving and embeddings (introduced in v1.2.2) and existing Databricks catalog and dataset integrations. This release upgrades to DataFusion v46, optimizes results caching performance, and strengthens security with least-privilege sandboxed improvements.

What's New in v1.3.0

  • Databricks SQL Statement Execution API Support: Added support for the Databricks SQL Statement Execution API, enabling direct SQL queries against Databricks SQL Warehouses for optimized performance in analytics and reporting workflows.

    Example spicepod.yml configuration:

    datasets:
    - from: databricks:spiceai.datasets.my_awesome_table
    name: my_awesome_table
    params:
    mode: sql_warehouse
    databricks_endpoint: ${env:DATABRICKS_ENDPOINT}
    databricks_sql_warehouse_id: ${env:DATABRICKS_SQL_WAREHOUSE_ID}
    databricks_token: ${env:DATABRICKS_TOKEN}

    For details, see the Databricks Data Connector documentation.

  • Improved Results Cache Performance & Hashing Algorithm: Spice now supports an alternative results cache hashing algorithm, ahash, in addition to siphash, being the default. Configure it via:

    runtime:
    results_cache:
    hashing_algorithm: ahash # or siphash

    The hashing algorithm determines how cache keys are hashed before being stored, impacting both lookup speed and protection against potential DOS attacks.

    Using ahash improves performance for large queries or query plans. Combined with results cache optimizations, it reduces 99th percentile request latency and increases total requests/second for queries with large result sets (100k+ cached rows). The following charts show performance tested against the TPCH Query #17 on a scale factor 5 dataset (30+ million rows, 5GB):

    LatencyReq/sec
    Improvements for the 99th percentile query latency, compared against 1.2.2 with cache key type and hashing algorithm.Improvements for the requests/second, compared against 1.2.2 with cache key type and hashing algorithm.

    Note: ahash was not available in v1.2.2, so it is excluded from comparisons.

    To learn more, refer to the Results Cache Hashing Algorithm documentation.

  • SQL Query Performance: Optimized the critical SQL query path, reducing overhead and improving response times for simple queries by 10-20%.

  • DuckDB Acceleration: Fixed a bug in the DuckDB acceleration engine causing query failures under high concurrency when querying datasets accelerated into multiple DuckDB files.

  • Container Security: The container image now runs as a non-root user with enhanced sandboxing and includes only essential dependencies for a slimmer, more secure image.

DataFusion v46 Highlights

Spice.ai is built on the DataFusion query engine. The v46 release brings:

  • Faster Performance 🚀: DataFusion 46 introduces significant performance enhancements, including a 2x faster median() function for large datasets without grouping, 10–100% speed improvements in FIRST_VALUE and LAST_VALUE window functions by avoiding sorting, and a 40x faster uuid() function. Additional optimizations, such as a 50% faster repeat() string function, accelerated chr() and to_hex() functions, improved grouping algorithms, and Parquet row group pruning with NOT LIKE filters, further boost overall query efficiency.

  • New range() Table Function: A new table-valued function range(start, stop, step) has been added to make it easy to generate integer sequences — similar to PostgreSQL’s generate_series() or Spark’s range(). Example: SELECT * FROM range(1, 10, 2);

  • UNION [ALL | DISTINCT] BY NAME Support: DataFusion now supports UNION BY NAME and UNION ALL BY NAME, which align columns by name instead of position. This matches functionality found in systems like Spark and DuckDB and simplifies combining heterogeneously ordered result sets.

    Example:

    SELECT col1, col2 FROM t1
    UNION ALL BY NAME
    SELECT col2, col1 FROM t2;

See the DataFusion 46.0.0 release notes for details.

Spice.ai adopts the latest minus one DataFusion release for quality assurance and stability. The upgrade to DataFusion v47 is planned for Spice v1.4.0 in June.

Contributors

Breaking Changes

The container image now always runs as a non-root user (UID/GID 65534) with minimal dependencies, resulting in a smaller, more secure image. Standard Linux tools, including bash, are no longer included.

Kubernetes Deployments:

  • Use of the v1.3.0+ Helm chart is required, which includes a securityContext ensuring the sandbox user has required file access.

  • For deployments using a lower version than the v1.3.0 Helm chart, add the following securityContext to the pod specification:

securityContext:
runAsUser: 65534
runAsGroup: 65534
fsGroup: 65534

See the Docker Sandbox Guide for details on how to update custom Docker images to restore the previous behavior.

Cookbook Updates

  • Added Accelerated Views: Pre-calculate and materialize data derived from one or more underlying datasets.

The Spice Cookbook now includes 67 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.3.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.3.0 image:

docker pull spiceai/spiceai:1.3.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

Changelog

See the full list of changes at: v1.2.2...v1.3.0

Spice v1.2.0 (Apr 28, 2025)

· 16 min read
Evgenii Khramkov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.2.0! 🚀

Spice v1.2.0 is a significant update. It upgrades DataFusion to v45 and Arrow to v54. This release brings faster query performance, support for parameterized queries in SQL and HTTP APIs, and the ability to accelerate views. Several bugs have been fixed and dependencies updated for better stability and speed.

DataFusion v45 Highlights

Spice.ai is built on the DataFusion query engine. The v45 release brings:

  • Faster Performance 🚀: DataFusion is now the fastest single-node engine for Apache Parquet files in the clickbench benchmark. Performance improved by over 33% from v33 to v45. Arrow StringView is now on by default, making string and binary data queries much faster, especially with Parquet files.

  • Better Quality 📋: DataFusion now runs over 5 million SQL tests per push using the SQLite sqllogictest suite. There are new checks for logical plan correctness and more thorough pre-release testing.

  • New SQL Functions ✨: Added show functions, to_local_time, regexp_count, map_extract, array_distance, array_any_value, greatest, least, and arrays_overlap.

See the DataFusion 45.0.0 release notes for details.

Spice.ai upgrades to the latest minus one DataFusion release to ensure adequate testing and stability. The next upgrade to DataFusion v46 is planned for Spice v1.3.0 in May.

What's New in v1.2.0

  • Parameterized Queries: Parameterized queries are now supported with the Flight SQL API and HTTP API. Positional and named arguments via $1 and :param syntax are supported, respectively. Logical plans for SQL statements are cached for faster repeated queries.

    Example Cookbook recipes:

    See the API Documentation for additional details.

  • Accelerated Views: Views, not just datasets, can now be accelerated. This provides much better performance for views that perform heavy computation.

    Example spicepod.yaml:

    views:
    - name: accelerated_view
    acceleration:
    enabled: true
    engine: duckdb
    primary_key: id
    refresh_check_interval: 1h
    sql: |
    select * from dataset_a
    union all
    select * from dataset_b

    See the Data Acceleration documentation.

  • Memory Usage Metrics & Configuration: Runtime now tracks memory usage as a metric, and a new runtime memory_limit parameter is available. The memory limit parameter applies specifically to the runtime and should be used in addition to existing memory usage configuration, such as duckdb_memory_limit. Memory usage for queries beyond the memory limit will spill to disk.

    See the Memory Reference for details.

  • New Worker Component: Workers are new configurable compute units in the Spice runtime. They help manage compute across models and tools, handle errors, and balance load. Workers are configured in the workers section of spicepod.yaml.

    Example spicepod.yaml:

    workers:
    - name: round-robin
    description: |
    Distributes requests between 'foo' and 'bar' models in a round-robin fashion.
    models:
    - from: foo
    - from: bar
    - name: fallback
    description: |
    Tries 'bar' first, then 'foo', then 'baz' if earlier models fail.
    models:
    - from: foo
    order: 2
    - from: bar
    order: 1
    - from: baz
    order: 3

    See the Workers Documentation for details.

  • Databricks Model Provider: Databricks models can now be used with from: databricks:model_name.

    Example spicepod.yaml:

    models:
    - from: databricks:llama-3_2_1_1b_instruct
    name: llama-instruct
    params:
    databricks_endpoint: dbc-46470731-42e5.cloud.databricks.com
    databricks_token: ${ secrets:SPICE_DATABRICKS_TOKEN }

See the Databricks model documentation.

  • spice chat CLI Improvements: The spice chat command now supports an optional --temperature parameter. A one-shot chat can also be sent with spice chat <message>.

  • More Type Support: Added support for Postgres JSON type and DuckDB Dictionary type.

  • Other Improvements:

    • New image tags let you pick memory allocators for different use-cases: jemalloc, sysalloc, and mimalloc.
    • Better error handling and logging for chat and model operations.

Contributors

Cookbook Updates

New recipes for:

The Spice Cookbook now includes 68 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.2.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.2.0 image:

docker pull spiceai/spiceai:1.2.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

Spice is now built with Rust 1.85.0 and Rust 2024.

Changelog

- Update end_game.md (#5312) by @peasee in https://github.com/spiceai/spiceai/pull/5312
- feat: Add initial testoperator query validation (#5311) by @peasee in https://github.com/spiceai/spiceai/pull/5311
- Update Helm + Prepare for next release (#5317) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5317
- Update spicepod.schema.json (#5319) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5319
- add integration test for reading encrypted PDFs from S3 (#5308) by @kczimm in https://github.com/spiceai/spiceai/pull/5308
- Stop `load_components` during runtime shutdown (#5306) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5306
- Update openapi.json (#5321) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5321
- feat: Implement record batch data validation (#5331) by @peasee in https://github.com/spiceai/spiceai/pull/5331
- Update QA analytics for v1.1.1 (#5320) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5320
- fix: Update benchmark snapshots (#5337) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5337
- Enforce pulls with Spice v1.0.4 (#5339) by @lukekim in https://github.com/spiceai/spiceai/pull/5339
- Upgrade to DataFusion 45, Arrow 54, Rust 1.85 & Edition 2024 (#5334) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5334
- feat: Allow validating testoperator in benchmark workflow (#5342) by @peasee in https://github.com/spiceai/spiceai/pull/5342
- Upgrade `delta_kernel` to 0.9 (#5343) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5343
- deps: Update odbc-api (#5344) by @peasee in https://github.com/spiceai/spiceai/pull/5344
- Fix schema inference for Snowflake tables with large number of columns (#5348) by @ewgenius in https://github.com/spiceai/spiceai/pull/5348
- feat: Update testoperator dispatch for validation, version metric (#5349) by @peasee in https://github.com/spiceai/spiceai/pull/5349
- fix: validate_results not validate (#5352) by @peasee in https://github.com/spiceai/spiceai/pull/5352
- revert to previous pdf-extract; remove test for encrypted pdf support (#5355) by @kczimm in https://github.com/spiceai/spiceai/pull/5355
- Stablize the test `verify_similarity_search_chat_completion` (#5284) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5284
- Turn off `delta_kernel::log_segment` logging and refactor log filtering (#5367) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5367
- Upgrade to DuckDB 1.2.2 (#5375) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5375
- Update Readme - fix broken and outdated links (#5376) by @ewgenius in https://github.com/spiceai/spiceai/pull/5376
- Upgrade dependabot dependencies (#5385) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5385
- fix: Remove IMAP oauth (#5386) by @peasee in https://github.com/spiceai/spiceai/pull/5386
- Bump Helm chart to 1.1.2 (#5389) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5389
- Refactor accelerator registry as part of runtime. (#5318) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5318
- Include `vnd.spiceai.sql/nsql.v1+json` response examples (openapi docs) (#5388) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5388
- docs: Update endgame template with SpiceQA, update qa analytics (#5391) by @peasee in https://github.com/spiceai/spiceai/pull/5391
- Make graceful shutdown timeout configurable (#5358) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5358
- docs: Update release criteria with note on max columns (#5401) by @peasee in https://github.com/spiceai/spiceai/pull/5401
- Update openapi.json (#5392) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5392
- FinanceBench: update scorer instructions and switch scoring model to `gpt-4.1` (#5395) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5395
- feat: Write OTel metrics for testoperator (#5397) by @peasee in https://github.com/spiceai/spiceai/pull/5397
- Update nsql openapi title (#5403) by @ewgenius in https://github.com/spiceai/spiceai/pull/5403
- Track `ai_inferences_count` with used tools flag. Extensible runtime request context. (#5393) by @ewgenius in https://github.com/spiceai/spiceai/pull/5393
- Include newly detected view as changed view (#5408) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5408
- Track used_tools in ai_inferences_with_spice_count as number (#5409) by @ewgenius in https://github.com/spiceai/spiceai/pull/5409
- Update openapi.json (#5406) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5406
- Tweak enforce pulls with Spice (#5411) by @lukekim in https://github.com/spiceai/spiceai/pull/5411
- Allow `flightsql` and `spiceai` connectors to override flight max message size (#5407) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5407
- Retry model graded scorer once on successful, empty response (#5405) by @Jeadie in https://github.com/spiceai/spiceai/pull/5405
- use span task name in 'spice trace' tree, not span_id (#5412) by @Jeadie in https://github.com/spiceai/spiceai/pull/5412
- Rename to `track_ai_inferences_with_spice_count` in all places (#5410) by @ewgenius in https://github.com/spiceai/spiceai/pull/5410
- Update qa_analytics.csv (#5421) by @peasee in https://github.com/spiceai/spiceai/pull/5421
- Remove the filter for the `list_datasets` tool in the AI inferences metric count. (#5417) by @ewgenius in https://github.com/spiceai/spiceai/pull/5417
- fix: Testoperator uses an exact API key for benchmark metric submission (#5413) by @peasee in https://github.com/spiceai/spiceai/pull/5413
- feat: Enable testoperator metrics in workflow (#5422) by @peasee in https://github.com/spiceai/spiceai/pull/5422
- Upgrade mistral.rs (#5404) by @Jeadie in https://github.com/spiceai/spiceai/pull/5404
- Include all FinanceBench documents in benchmark tests (#5426) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5426
- Handle second Ctrl-C to force runtime termination (#5427) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5427
- Add optional `--temperature` parameter for `spice chat` CLI command (#5429) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5429
- Remove `with_runtime_status` from the `RuntimeBuilder` (#5430) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5430
- Fix spice chat error handling (#5433) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5433
- Add more test models to FinanceBench benchmark (#5431) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5431
- support 'from: databricks:model_name' (#5434) by @Jeadie in https://github.com/spiceai/spiceai/pull/5434
- Upgrade Pulls with Spice to v1.0.6 and add concurrency control (#5442) by @lukekim in https://github.com/spiceai/spiceai/pull/5442
- Upgrade DataFusion table providers (#5443) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5443
- Test spice chat in e2e_test_spice_cli (#5447) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5447
- Allow for one-shot chat request using `spice chat <message>` (#5444) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5444
- Enable parallel data sampling for NSQL (#5449) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5449
- Upgrade Go from v1.23.4 to v1.24.2 (#5462) by @lukekim in https://github.com/spiceai/spiceai/pull/5462
- Update PULL_REQUEST_TEMPLATE.md (#5465) by @lukekim in https://github.com/spiceai/spiceai/pull/5465
- Enable captured outputs by default when spiced is started by the CLI (spice run) (#5464) by @lukekim in https://github.com/spiceai/spiceai/pull/5464
- Parameterized queries via Flight SQL API (#5420) by @kczimm in https://github.com/spiceai/spiceai/pull/5420
- fix: Update benchmarks readme badge (#5466) by @peasee in https://github.com/spiceai/spiceai/pull/5466
- delay auth check for binding parameterized queries (#5475) by @kczimm in https://github.com/spiceai/spiceai/pull/5475
- Add support for `?` placeholder syntax in parameterized queries (#5463) by @kczimm in https://github.com/spiceai/spiceai/pull/5463
- enable task name override for non static span names (#5423) by @Jeadie in https://github.com/spiceai/spiceai/pull/5423
- Allow parameter queries with no parameters (#5481) by @kczimm in https://github.com/spiceai/spiceai/pull/5481
- Support unparsing UNION for distinct results (#5483) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5483
- add rust-toolchain.toml (#5485) by @kczimm in https://github.com/spiceai/spiceai/pull/5485
- Add parameterized query support to the HTTP API (#5484) by @kczimm in https://github.com/spiceai/spiceai/pull/5484
- E2E test for spice chat <message> behavior (#5451) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5451
- Renable and fix huggingface models integration tests (#5478) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5478
- Update openapi.json (#5488) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5488
- feat: Record memory usage as a metric (#5489) by @peasee in https://github.com/spiceai/spiceai/pull/5489
- fix: update dispatcher to run all benchmarks, rename metric, update spicepods, add scale factor (#5500) by @peasee in https://github.com/spiceai/spiceai/pull/5500
- Fix ILIKE filters support (#5502) by @ewgenius in https://github.com/spiceai/spiceai/pull/5502
- fix: Update test spicepod locations and names (#5505) by @peasee in https://github.com/spiceai/spiceai/pull/5505
- fix: Update benchmark snapshots (#5508) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5508
- fix: Update benchmark snapshots (#5512) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5512
- Fix Delta Lake bug for: Found unmasked nulls for non-nullable StructArray field "predicate" (#5515) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5515
- fix: working directory for duckdb e2e test spicepods (#5510) by @peasee in https://github.com/spiceai/spiceai/pull/5510
- Tweaks to README.md (#5516) by @lukekim in https://github.com/spiceai/spiceai/pull/5516
- Cache logical plans of SQL statements (#5487) by @kczimm in https://github.com/spiceai/spiceai/pull/5487
- Fix `content-type: application/json` (#5517) by @Jeadie in https://github.com/spiceai/spiceai/pull/5517
- Validate postgres results in testoperator dispatch (#5504) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5504
- fix: Update benchmark snapshots (#5511) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5511
- Fix results cache by SQL with prepared statements (#5518) by @kczimm in https://github.com/spiceai/spiceai/pull/5518
- Add initial support for views acceleration (#5509) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5509
- fix: Update benchmark snapshots (#5527) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5527
- Support switching the memory allocator Spice uses via `alloc-*` features. (#5528) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5528
- fix: Update benchmark snapshots (#5525) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5525
- Add test spicepod for tpch mysql-duckdb[file acceleration] (#5521) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5521
- Fix nightly arm build - change tag `-default` to `-models` (#5529) by @ewgenius in https://github.com/spiceai/spiceai/pull/5529
- LLM router via `worker` spicepod component (#5513) by @Jeadie in https://github.com/spiceai/spiceai/pull/5513
- Apply Spice advanced acceleration logic and params support to accelerated views (#5526) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5526
- Enable DatasetCheckpoint logic for accelerated views (#5533) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5533
- Fix public '.model' name for router workers (#5535) by @Jeadie in https://github.com/spiceai/spiceai/pull/5535
- feat: Add Runtime memory limit parameter (#5536) by @peasee in https://github.com/spiceai/spiceai/pull/5536
- For fallback worker, check first item in `chat/completion` stream. (#5537) by @Jeadie in https://github.com/spiceai/spiceai/pull/5537
- Move rate limit check to after parameterized query binding (#5540) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5540
- Update spicepod.schema.json (#5545) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5545
- Accelerate views: refresh_on_startup, ready_state, jitter params support (#5547) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5547
- Add integration test for accelerated views (#5550) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5550
- Don't install make or expect on spiceai-macos runners (#5554) by @lukekim in https://github.com/spiceai/spiceai/pull/5554
- `event_stream` crate for emitting events from tracing::Span; used in v1/chat/completions streaming. (#5474) by @Jeadie in https://github.com/spiceai/spiceai/pull/5474
- Fix typo in method (#5559) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5559
- Run test operator every day and current and previous commits (#5557) by @lukekim in https://github.com/spiceai/spiceai/pull/5557
- Add aws_allow_http parameter for delta lake connector (#5541) by @Sevenannn in https://github.com/spiceai/spiceai/pull/5541
- feat: Add branch name to metric dimensions in testoperator (#5563) by @peasee in https://github.com/spiceai/spiceai/pull/5563
- fix: Update the tpch benchmark snapshots for: ./test/spicepods/tpch/sf1/federated/odbc[databricks].yaml (#5565) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5565
- fix: Split scheduled dispatch into a separate job (#5567) by @peasee in https://github.com/spiceai/spiceai/pull/5567
- fix: Use outputs.SPICED_COMMIT (#5568) by @peasee in https://github.com/spiceai/spiceai/pull/5568
- fix: Use refs in testoperator dispatch instead of commits (#5569) by @peasee in https://github.com/spiceai/spiceai/pull/5569
- fix: actions/checkout ref does not take a full ref (#5571) by @peasee in https://github.com/spiceai/spiceai/pull/5571
- fix: Testoperator dispatch (#5572) by @peasee in https://github.com/spiceai/spiceai/pull/5572
- Respect `update-snapshots` when running all benchmarks manually (#5577) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5577
- Use FETCH_HEAD instead of ${{ inputs.ref }} to list commits in setup_spiced (#5579) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5579
- Add additional test scenarios for benchmarks (#5582) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5582
- fix: Update the tpch benchmark snapshots for: test/spicepods/tpch/sf1/accelerated/databricks[delta_lake]-duckdb[file].yaml (#5590) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5590
- fix: Update the tpch benchmark snapshots for: test/spicepods/tpch/sf1/accelerated/mysql-duckdb[file].yaml (#5591) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5591
- Fix Snowflake data connector rows ordering (#5599) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5599
- fix: Update benchmark snapshots (#5595) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5595
- fix: Update the tpch benchmark snapshots for: test/spicepods/tpch/sf1/accelerated/databricks[delta_lake]-arrow.yaml (#5594) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5594
- fix: Update benchmark snapshots (#5589) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5589
- fix: Update benchmark snapshots (#5583) by @app/github-actions in https://github.com/spiceai/spiceai/pull/5583
- Downgrade DuckDB to 1.1.3 (#5607) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5607
- Add prepared statement integration tests (#5544) by @kczimm in https://github.com/spiceai/spiceai/pull/5544

Full Changelog: v1.1.2...v1.2.0