Spice v1.7.1 (Sep 29, 2025)

September 30, 2025 · 6 min read

Principal Software Engineer at Spice AI

Announcing the release of Spice v1.7.1! 🔍

Spice v1.7.1 is a patch release focused on search improvements, bug fixes, and performance enhancements. This release introduces the Reciprocal Rank Fusion (RRF) user-defined table function (UDTF) for hybrid search, improves vector and text search reliability, and resolves several issues across the runtime, connectors, and query engine.

What's New in v1.7.1

Reciprocal Rank Fusion (RRF) UDTF: Spice now supports Reciprocal Rank Fusion (RRF) as a user-defined table function, enabling advanced hybrid search scenarios that combine results from multiple search methods (e.g., vector and text search) for improved relevance ranking.

Features:

Multi-search fusion: Combine results from vector_search, text_search, and other search UDTFs in a single query.
Advanced tuning: Per-query ranking weights, recency boosting, and configurable decay functions.
Performance: Optional user-specified join key for optimal performance.
Automatic joining: Falls back to on-the-fly JOIN key computation when no explicit key is provided.

Example usage:

SELECT id, title, content, fused_score
FROM rrf(
  vector_search(documents, 'machine learning algorithms', rank_weight => 1.5),
  text_search(documents, 'neural networks deep learning', rank_weight => 1.2),
  join_key => 'id',    -- optional join key for optimal performance
  k => 60.0            -- optional smoothing factor
)
WHERE fused_score > 0.01
ORDER BY fused_score DESC;

Learn more in the RRF documentation.

Acceleration Refresh Metrics: Spice now exposes additional Prometheus metrics that provide detailed observability into dataset acceleration refreshes. These metrics help monitor data freshness and ingestion lag for accelerated datasets with a time column.

Reported metrics:

Metric Name	Description
`dataset_acceleration_max_timestamp_before_refresh_ms`	Maximum value of the dataset's time column before refresh (milliseconds).
`dataset_acceleration_max_timestamp_after_refresh_ms`	Maximum value of the dataset's time column after refresh (milliseconds).
`dataset_acceleration_refresh_lag_ms`	Difference between max timestamp after and before refresh (milliseconds).
`dataset_acceleration_ingestion_lag_ms`	Lag between current wall-clock time and max timestamp after refresh (milliseconds).

These metrics are emitted during each acceleration refresh and can be scraped by Prometheus for monitoring and alerting. For more details, see the Observability documentation.

Bug Fixes & Improvements

This release resolves several issues and improves reliability across search, connectors, and query planning:

Full-Text Search (FTS): Ensure FTS metadata columns can be used in projection, fix JOIN-level filters not having columns in schema, and adds support for persistent file-based FTS indexes. Default limit of 1000 results if no limit specified.
Vector Search: Default limit of 1000 results if no limit specified, and fix removing embedding column.
Databricks SQL Warehouse: Improved error handling and support for async queries.
Other: Fixes for Anthropic model regex validation, tweaked AI-model health checks, and improved error messages.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

Added Hybrid-Search using RRF - Combine results from multiple search methods (vector and text search) using Reciprocal Rank Fusion for improved relevance ranking.

The Spice Cookbook includes 78 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.7.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.7.1 image:

docker pull spiceai/spiceai:1.7.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Changelog

ensure FTS metadata columns can be used in projection (#7282) by @Jeadie in #7282
Fix JOIN level filters not having columns in schema (#7287) by @Jeadie in #7287
Use file-based fts index (#7024) by @Jeadie in #7024
Remove 'PostApplyCandidateGeneration' (#7288) by @Jeadie in #7288
RRF: Rank and recency boosting (#7294) by @mach-kernel in #7294
RRF: Preserve base ranking when results differ -> FULL OUTER JOIN does not produce time column (#7300) by @mach-kernel in #7300
fix removing embedding column (#7302) by @Jeadie in #7302
RRF: Fix decay for disjoint result sets (#7305) by @mach-kernel in #7305
RRF: Project top scores, do not yield duplicate results (#7306) by @mach-kernel in #7306
RRF: Case sensitive column/ident handling (#7309) by @mach-kernel in #7309
For vector_search, use a default limit of 1000 if no limit specified (#7311) by @lukekim in #7311
Fix Anthropic model regex and add validation tests (#7319) by @ewgenius in #7319
Enhancement: Implement before/after/lag metrics for acceleration refresh (#7310) by @krinart in #7310
Refactor chat model health check to lower tokens usage for reasoning models (#7317) by @ewgenius in #7317
Enable chunking in SearchIndex (#7143) by @Jeadie in #7143
Use logical plan in SearchQueryProvider. (#7314) by @Jeadie in #7314
FTS max search results 100 -> 1000 (#7331) by @Jeadie in #7331
Improve Databricks SQL Warehouse Error Handling (#7332) by @sgrebnov in #7332
use spicepod embedding model name for 'model_name' (#7333) by @Jeadie in #7333
Handle async queries for Databricks SQL Warehouse API (#7335) by @phillipleblanc in #7335
RRF: Fix ident resolution for struct fields, autohashed join key for varying types (#7339) by @mach-kernel in #7339

Spice v1.7.0 (Sep 23, 2025)

September 23, 2025 · 21 min read

Sergei Grebnov

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.7.0! ⚡

Spice v1.7.0 upgrades to DataFusion v49 for improved performance and query optimization, introduces real-time full-text search indexing for CDC streams, EmbeddingGemma support for high-quality embeddings, new search table functions powering the /v1/search API, embedding request caching for faster and cost-efficient search and indexing, and OpenAI Responses API tool calls with streaming. This release also includes numerous bug fixes across CDC streams, vector search, the Kafka Data Connector, and error reporting.

What's New in v1.7.0

DataFusion v49 Highlights

DataFusion Clickbench Performance Graph Source: DataFusion 49.0.0 Release Blog.

Performance Improvements 🚀

Equivalence System Upgrade: Faster planning for queries with many columns, enabling more sophisticated sort-based optimizations.
Dynamic Filters & TopK Pushdown: Queries with ORDER BY and LIMIT now use dynamic filters and physical filter pushdown, skipping unnecessary data reads for much faster top-k queries.
Compressed Spill Files: Intermediate files written during sort/group spill to disk are now compressed, reducing disk usage and improving performance.
WITHIN GROUP for Ordered-Set Aggregates: Support for ordered-set aggregate functions (e.g., percentile_disc) with WITHIN GROUP.
REGEXP_INSTR Function: Find regex match positions in strings.

Spice Runtime Highlights

EmbeddingGemma Support: Spice now supports EmbeddingGemma, Google's state-of-the-art embedding model for text and documents. EmbeddingGemma provides high-quality, efficient embeddings for semantic search, retrieval, and recommendation tasks. You can use EmbeddingGemma via HuggingFace in your Spicepod configuration:

Example spicepod.yml snippet:

embeddings:
  - from: huggingface:huggingface.co/google/embeddinggemma-300m
    name: embeddinggemma
    params:
      hf_token: ${secrets:HUGGINGFACE_TOKEN}

Learn more about EmbeddingGemma in the official documentation.

POST /v1/search API Use Search Table Functions: The /v1/search API now uses the new text_search and vector_search Table Functions for improved performance.

Embedding Request Caching: The runtime now supports caching embedding requests, reducing latency and cost for repeated content and search requests.

Example spicepod.yml snippet:

runtime:
  caching:
    embeddings:
      enabled: true
      max_size: 128mb
      item_ttl: 5s

See the Caching documentation for details.

Real-Time Indexing for Full Text Search: Full Text search indexing is now supported for connectors that enable real-time changes, such as Debezium CDC streams. Adding a full-text index on a column with refresh_mode: changes works as it does for full/append-mode refreshes, enabling instant search on new data.

Example spicepod.yml snippet:

datasets:
  - from: debezium:cdc.public.question
    name: questions
    acceleration:
      enabled: true
      engine: duckdb
      primary_key: id
      refresh_mode: changes # Use 'changes'
    params: *kafka_params
    columns:
      - name: title
        full_text_search:
          enabled: true # Enable full-text-search indexing
          row_id:
            - id

OpenAI Responses API Tool Calls with Streaming: The OpenAI Responses API now supports tool calls with streaming, enabling advanced model interactions such as web_search and code_interpreter with real-time response streaming. This allows you to invoke OpenAI-hosted tools and receive results as they are generated.

Learn more in the OpenAI Model Provider documentation.

Runtime Output Level Configuration: You can now set the output_level parameter in the Spicepod runtime configuration to control logging verbosity in addition to the existing CLI and environment variable support. Supported values are info, verbose, and very_verbose. The value is applied in the following priority: CLI, environment variables, then YAML configuration.

Example spicepod.yml snippet:

runtime:
  output_level: info # or verbose, very_verbose

For more details on configuring output level, see the Troubleshooting documentation.

Bug Fixes

Several bugs and issues have been resolved in this release, including:

CDC Streams: Fixed issues where refresh_mode: changes could prevent the Spice runtime from becoming Ready, and improved support for full-text indexing on CDC streams.
Vector Search: Fixed bugs where vector search HTTP pipeline could not find more than one IndexedTableProvider, and resolved errors with field mismatches in vector_search UDTF.
Kafka Integration: Improved Kafka schema inference with configurable sample size, improved consumer group persistence for SQLite and Postgres accelerations, and added cooperative mode support.
Perplexity Web Search: Fixed bug where Perplexity web search sometimes used incorrect query schema (limit).
Databricks: Fixed issue with unparsing embedded columns.
Error Reporting: ThrottlingException is now reported correctly instead of as InternalError.
Iceberg Data Connector: Added support for LIMIT pushdown.
Amazon S3 Vectors: Fixed ingestion issues with zero-vectors and improved handling when vector index is full.
Tracing: Fixed vector search tracing to correctly report SQL status.

Contributors

New Contributors

@ChrisTomAlxHitachi made their first contribution in github.com/spiceai/spiceai/pull/6932 🎉

Breaking Changes

No breaking changes.

Cookbook Updates

New Spice with Dotnet SDK Recipe - The recipe shows how to query Spice using the Dotnet SDK.

The Spice Cookbook includes 78 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.7.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.7.0 image:

docker pull spiceai/spiceai:1.7.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

Rust: Upgraded from 1.88.0 to 1.89.0
DataFusion: Upgraded from 48.0.1 to 49.0.0
text-embeddings-inference: Upgraded from 1.7.3 to 1.8.2
twox-hash: Upgraded from 1.6.3 to 2.1.0.

Changelog

Fix parameterised query planning in DataFusion by @Jeadie in #6942
fix: Update benchmark snapshots by @app/github-actions in #6944
refactor: Decouple full text search candidate from UDTF by @peasee in #6940
fix: Re-enable search integration tests by @peasee in #6930
Update acknowledgements and spicepod.schema.json by @sgrebnov in #6948
Add enabling the responses API by @lukekim in #6949
Post-release housekeeping by @sgrebnov in #6951
Add missing param in release notes by @Advayp in #6959
Create comprehensive S3vectors test by @Jeadie in #6903
Update ROADMAP after v1.6 release by @sgrebnov in #6955
Update openapi.json by @app/github-actions in #6961
Add build step for new spiced images in end game template by @Jeadie in #6960
refactor: Use text search UDTF in v1/search by @peasee in #6962
Bump Jimver/cuda-toolkit from 0.2.26 to 0.2.27 by @app/dependabot in #6922
Bump notify from 8.0.0 to 8.2.0 by @app/dependabot in #6924
Use model2vec for search integration tests for speed by @Jeadie in #6971
feat: Add initial DuckDB regexp pushdown support by @peasee in #6966
Bump rustyline from 16.0.0 to 17.0.1 by @app/dependabot in #6976
Upgrade delta_kernel to 0.14 by @phillipleblanc in #6977
Consistent snapshots for mongodb by @krinart in #6974
Bump indexmap from 2.10.0 to 2.11.0 by @app/dependabot in #6921
Fix mongo tests: ignore container_registry() when building image name by @krinart in #6983
Implement support for s3 tables for glue DataConnector by @krinart in #6981
Bump serde_json from 1.0.142 to 1.0.143 by @app/dependabot in #6925
Update build_and_release macOS pipeline to skip updating cmake if installed by @phillipleblanc in #6998
Mark Kafka Data Connector Alpha quality by @sgrebnov in #6991
Add v1.6.1 release notes by @lukekim in #7000
Spice CLI trace: make error friendlier when task_history is disabled by @sgrebnov in #6996
Warn when runtime or management is added in spicepod dependency by @Jeadie in #6953
Enable .datasets[].vectors.params.s3_vectors_distance_metric for S3 Vectors by @Jeadie in #6982
Add s3_vectors index support for CDC and Append streams by @sgrebnov in #6986
Find all vector indexes in v1/search by @Jeadie in #7004
Fix RRF; reorder by score by @Jeadie in #7007
Fix for nested VectorScanTableProvider by @krinart in #7017
Add --sql flag to output SQL query for spice trace by @Jeadie in #7002
Make web search params engine-specific by @Advayp in #7022
Add more MTEB benchmark spicepods by @peasee in #7026
Improve error messaging in tools by @Jeadie in #6895
Add retry for exporting task history records by @sgrebnov in #7049
Increase DoPut write timeout for the next batch from 30 to 120 seconds by @sgrebnov in #7054
Avoid redundant search embedding by @peasee in #7053
Truncate text_embed task_history trace by @sgrebnov in #7050
Use the UTC offset for the start_time and end_time fields in the task history by @ewgenius in #7056
Update supported versions in SECURITY.md by @Jeadie in #7060
Add integration test for Kafka S3 Vectors by @sgrebnov in #6988
Enable parameters to enforce the value is one of several options by @Jeadie in #6984
feat(iceberg): lakekeeper catalog - add warehouse param to spicepod by @ChrisTomAlxHitachi in #6932
feat: Add HTTP query concurrency support to testoperator by @peasee in #7025
Ensure no data does not throw error in v1/search by @Jeadie in #7033
Bump github.com/spf13/cobra from 1.9.1 to 1.10.1 by @app/dependabot in #7013
Add QA analytics for 1.6.x releases by @sgrebnov in #7082
Use env variable for HF cache in model2vec by @Jeadie in #7076
chore: upgrade to Rust 1.88 by @kczimm in #7077
Kafka/Debezium: make common errors user-friendlier by @sgrebnov in #7084
Create Apache Datafusion upgrade issue template by @kczimm in #6800
No join predicate pushdown on empty results by @Jeadie in #7075
Bump tract-onnx from 0.21.10 to 0.22.0 by @app/dependabot in #7071
Bump mongodb from 3.2.4 to 3.3.0 by @app/dependabot in #7073
Bump indicatif from 0.17.11 to 0.18.0 by @app/dependabot in #7070
Bump actions/github-script from 7 to 8 by @app/dependabot in #7069
Bump actions/setup-go from 5 to 6 by @app/dependabot in #7068
Bump actions/download-artifact from 4 to 5 by @app/dependabot in #7066
Bedrock: Tool use without inputs must empty Document by @Jeadie in #7036
Bump github.com/stretchr/testify from 1.10.0 to 1.11.1 by @app/dependabot in #7015
Bump actions/setup-python from 5 to 6 by @app/dependabot in #7067
Upgrade dependabot dependencies by @phillipleblanc in #7061
Bump tempfile from 3.20.0 to 3.21.0 by @app/dependabot in #7018
Only call 'list_datasets' once, after initial system/user messages by @Jeadie in #7039
Bump github.com/spf13/pflag from 1.0.7 to 1.0.10 by @app/dependabot in #7062
Bump actions/checkout from 4 to 5 by @app/dependabot in #7065
Bump golang.org/x/mod from 0.27.0 to 0.28.0 by @app/dependabot in #7064
Bump github.com/AzureAD/microsoft-authentication-library-for-go from 1.4.1 to 1.5.0 by @app/dependabot in #7063
Add friendly message for Kafka operation timeout error, improve code by @sgrebnov in #7088
embed UDF by @mach-kernel in #6967
fix: Update benchmark snapshots by @app/github-actions in #7097
Fix SF100 benchmark tests dispatch by @sgrebnov in #7098
chore(logging): add log when iceberg rest catalog fails with ssl cert error by @ChrisTomAlxHitachi in #6909
Add xxhash support for search/sql results by @krinart in #6978
Use proper federation in max_timestamp_df during acceleration refresh by @krinart in #7055
Fix spiced_docker workflows for new actions/download-artifact@v5 behavior by @phillipleblanc in #7108
Fix spiced_docker workflow by @phillipleblanc in #7111
Add filter for zero vectors before writing to S3 Vectors by @phillipleblanc in #7110
Ensure we find vector index when it also has text search by @Jeadie in #7120
Enable unified traceparent override support for HTTP API by @sgrebnov in #7122
Fix ORDER BY: (BytesProcessedExec to avoid pruning ordered execs during physical optimization) by @mach-kernel in #7105
Fix spiced_docker_nightly workflow by @sgrebnov in #7125
Add output_level to runtime config by @krinart in #7119
Add tests for xxhash hashers by @krinart in #7124
Add input option to update snapshots in Integration tests by @Jeadie in #7127
Fix formatting to improve merges by @lukekim in #7128
Add tests to nulling logic by @Jeadie in #7113
Bump chrono from 0.4.41 to 0.4.42 by @app/dependabot in #7131
Bump ctrlc from 3.4.7 to 3.5.0 by @app/dependabot in #7132
Search: RRF UDTF by @mach-kernel in #7090
Update openapi.json by @app/github-actions in #7141
Bump packages to DF49; resolve incompatibilities by @Jeadie in #7101
fix: Don't error for chunked columns when vectors are disabled by @peasee in #7150
Allow bzip2-1.0.6 license in deny.toml by @Jeadie in #7148
Tune retry settings for Kafka/Debezium connectors by @sgrebnov in #7142
Update TEI by @Jeadie in #7152
Use twox-hash version 2.1.2 by @krinart in #7165
Revert "Use proper federation in max_timestamp_df during acceleration refresh (#7055)" by @phillipleblanc in #7156
Bump octocrab from 0.44.1 to 0.45.0 by @app/dependabot in #7158
Bump github.com/spf13/viper from 1.19.0 to 1.21.0 by @app/dependabot in #7130
Bump keyring from 3.6.2 to 3.6.3 by @app/dependabot in #7157
fix: Remove keywords from AI document search by @peasee in #7052
Bump tract-core from 0.21.10 to 0.22.0 by @app/dependabot in #7134
Update TEI by @Jeadie in #7171
Update openapi.json by @app/github-actions in #7172
fix: Ensure vector search UDTF respects the supplied projection by @peasee in #7155
Bump clap from 4.5.45 to 4.5.47 by @app/dependabot in #7135
Bump golang.org/x/sys from 0.35.0 to 0.36.0 by @app/dependabot in #7129
Include 'catalog_id' in Glue catalog parameters by @Jeadie in #7151
fix: Use head ref from merge group event in pulls-with-spice concurrency group by @peasee in #7175
Fix lint for xxhash feature by @phillipleblanc in #7176
Add Kafka-specific metrics for consumer lag and consumed records by @sgrebnov in #7146
Kafka: persist consumer between restarts with SQLite and PG acceleration by @sgrebnov in #7177
Kafka: support specifying a target consumer group ID by @sgrebnov in #7178
Fix timestamp parsing for spice trace by @krinart in #7173
Support full-text indexing on CDC/append streams by @phillipleblanc in #7180
Bump iceberg-rust version to include limit push down by @krinart in #7191
Make full text stream connector more robust by @phillipleblanc in #7193
fix: Update benchmark snapshots by @app/github-actions in #7179
Initial changes for SearchIndex by @Jeadie in #7103
Robustly handle indexing FTS for CDC streams by @phillipleblanc in #7197
Proper handling/mapping for ThrottlingException during embedding calls by @krinart in #7170
Add spicepod.yml by @lukekim in #7202
Delta Lake: Support read pruning on timestamp columns using maxValues stats by @sgrebnov in #7203
feat: Add initial embeddings cache by @peasee in #7194
Make S3vector a FixedSizeListArray by @Jeadie in #7201
Fix projection mismatch issues with RRF calling vector search / text search by @mach-kernel in #7200
feat: Add embeddings cache to all embeddings by @peasee in #7204
Revert "Make S3vector a FixedSizeListArray (#7201)" by @kczimm in #7210
Update duckdb version to make ICU statically linked by default by @krinart in #7215
Change DataType list nullability from true to false by @Jeadie in #7216
Use Instant + saturating_sub to handle time drift by @krinart in #7212
Flatten 'IndexedTableProvider' when adding full-text support by @Jeadie in #7219
Include comments in pulls by @lukekim in #7224
Add github_max_concurrent_connections = 5 by @lukekim in #7225
RRF: Fix scoring by @mach-kernel in #7226
Update RRF search integration snapshots after scoring change by @mach-kernel in #7227
Make S3vector a FixedSizeListArray by @Jeadie in #7230
Proper federation during acceleration refresh + datafusion version bump + integration tests by @krinart in #7228
Use DuckDBDialect for DuckDB non-federated queries by @krinart in #7232
Move chunking out of llms and into new crate chunking by @Jeadie in #7229
Remove duplicate pg_port configuration in test by @lukekim in #7233
Upgrade to Rust 1.89 by @phillipleblanc in #7235
Catalog connection error: fix connector name from 'iceberg' to 'spice.ai' by @sgrebnov in #7240
Create PutVectorsSink by @kczimm in #7199
Benchmark tests: fix API key reference in spicecloud catalog by @sgrebnov in #7239
Add Dotnet SDK sample to end game template by @sgrebnov in #7238
Update spicepod.schema.json by @app/github-actions in #7254
Postgres: Improve Decimals read performance and add Name type support by @sgrebnov in #7255
Add tests for hybrid search on a vector engine by @Jeadie in #7220

Spice v1.6.1 (Sep 1, 2025)

September 2, 2025 · 3 min read

Jack Eadie

Token Plumber at Spice AI

Announcing the release of Spice v1.6.1! ⚡

Spice 1.6.1 is a patch release that provides improved Kafka type inference and JSON flattening support, alongside several bug fixes.

What's New in v1.6.1

Improved Kafka Type Inference: Improve Kafka type inference by configuring the number of Kafka messages sampled during schema inference. Increasing the sample size can improve the robustness and reliability of inferred schemas, especially in cases where data contains optional fields or varying structures.

Example spicepod.yml:

dataset:
  - from: kafka:orders_events
    name: orders
    params:
      schema_infer_max_records: 100 # Default 1.

For details, see the Kafka Data Connector Documentation.

Improved Kafka JSON Support: Enable nested JSON Kafka messages to be represented in flattened JSON format for the dataset schema.

Example spicepod.yml:

dataset:
  - from: kafka:orders_events
    name: orders
    params:
      flatten_json: true # default false

For example, the object:

{
  "order_id": "a1f2c3d4-1111-2222-3333-444455556666",
  "customer": {
    "id": 101,
    "name": "Alice",
    "premium": true,
    "contact": {
      "email": "[email protected]",
      "phone": "555-1234"
    }
  },
  "discount": 5.0,
  "shipped": false
}

With flatten_json: true the result is:

+------------------------+-----------+-------------+
| column_name            | data_type | is_nullable |
+------------------------+-----------+-------------+
| order_id               | Utf8      | YES         |
| customer.id            | Int64     | YES         |
| customer.name          | Utf8      | YES         |
| customer.premium       | Boolean   | YES         |
| customer.contact.email | Utf8      | YES         |
| customer.contact.phone | Utf8      | YES         |
| discount               | Float64   | YES         |
| shipped                | Boolean   | YES         |
+------------------------+-----------+-------------+

With flatten_json: false or ommitted the result is:

+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| column_name | data_type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | is_nullable |
+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| order_id    | Utf8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | YES         |
| customer    | Struct([Field { name: "id", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "premium", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "contact", data_type: Struct([Field { name: "email", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "phone", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]) | YES         |
| discount    | Float64                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | YES         |
| shipped     | Boolean                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | YES         |
+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+

For details, see the Kafka Data Connector Documentation.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No new cookbook recipes added in this release.

The Spice Cookbook includes 77 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.6.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.6.1 image:

docker pull spiceai/spiceai:1.6.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Changelog

Fix metadata field issue by @Advayp in #6957
Update datafusion and datafusion-table-providers crates (#6985) by @Jeadie in #6985
Add flatten_json param support for Kafka connector (#6976) by @sgrebnov in #6976
Add schema_inference_sample_count param support for Kafka connector (#6969) by @sgrebnov in #6969
Add integration test for Kafka connector (#6965) by @sgrebnov in #6965
Skip dataset health check for IcebergTableProvider datasets by @phillipleblanc in #6995

Spice v1.6.0 (Aug 26, 2025)

August 27, 2025 · 22 min read

Sergei Grebnov

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.6.0! 🔥

Spice 1.6.0 upgrades DataFusion to v48, reducing expressions memory footprint by ~50% for faster planning and lower memory usage, eliminating unnecessary projections in queries, optimizing string functions like ascii and character_length for up to 3x speedup, and accelerating unbounded aggregate window functions by 5.6x. The release adds Kafka and MongoDB connectors for real-time streaming and NoSQL data acceleration, supports OpenAI Responses API for advanced model interactions including OpenAI-hosted tools like web_search and code_interpreter, improves the OpenAI Embeddings Connector with usage tier configuration for higher throughput via increased concurrent requests, introduces Model2Vec embeddings for ultra-low-latency encoding, and improves the Amazon S3 Vectors engine to support multi-column primary keys.

What's New in v1.6.0

DataFusion v48 Highlights

Spice.ai is built on the DataFusion query engine. The v48 release brings:

Performance & Size Improvements 🚀: Expressions memory footprint was reduced by ~50% resulting in faster planning and lower memory usage, with planning times improved by 10-20%. There are now fewer unnecessary projections in queries. The string functions, ascii and character_length were optimized for improved performance, with character_length achieving up to 3x speedup. Queries with unbounded aggregate window functions have improved performance by 5.6 times via avoided unnecessary computation for constant results across partitions. The Expr struct size was reduced from 272 to 144 bytes.

New Features & Enhancements ✨: Support was added for ORDER BY ALL for easy ordering of all columns in a query.

See the Apache DataFusion 48.0.0 Blog for details.

Runtime Highlights

Amazon S3 Vectors Multi-Column Primary Keys: The Amazon S3 Vectors engine now supports datasets with multi-column primary keys. This enables vector indexes for datasets where more than one column forms the primary key, such as those splitting documents into chunks for retrieval contexts. For multi-column keys, Spice serializes the keys using arrow-json format, storing them as single string keys in the vector index.

Model2Vec Embeddings: Spice now supports model2vec static embeddings with a new model2vec embeddings provider, for sentence transformers up to 500x faster and 15x smaller, enabling scenarios requiring low latency and high-throughput encoding.

embeddings:
  - from: model2vec:minishlab/potion-base-8M # HuggingFace model
    name: potion
  - from: model2vec:path/to/my/local/model # local model
    name: local

Learn more in the Model2Dev Embeddings documentation.

Kafka Data Connector: Use from: kafka:<topic> to ingest data directly from Kafka topics for integration with existing Kafka-based event streaming infrastructure, providing real-time data acceleration and query without additional middleware.

Example Spicepod.yml:

- from: kafka:orders_events
  name: orders
  acceleration:
    enabled: true
    refresh_mode: append
  params:
    kafka_bootstrap_servers: server:9092

Learn more in the Kafka Data Connector documentation.

MongoDB Data Connector: Use from: mongodb:<dataset> to access and accelerate data stored in MongoDB, deployed on-premises or in the cloud.

Example spicepod.yml:

datasets:
  - from: mongodb:my_dataset
    name: my_dataset
    params:
      mongodb_host: localhost
      mongodb_db: my_database
      mongodb_user: my_user
      mongodb_pass: password

Learn more in the MongoDB Data Connector documentation.

OpenAI Responses API Support: The OpenAI Responses API (/v1/responses) is now supported, which is OpenAI's most advanced interface for generating model responses.

To enable the /v1/responses HTTP endpoint, set the responses_api parameter to enabled:

Example spicepod.yml:

models:
  - name: openai_model_using_responses_api
    from: openai:gpt-4.1
    params:
      openai_api_key: ${ secrets:OPENAI_API_KEY }
      responses_api: enabled # Enable the /v1/responses endpoint for this model

Example curl request:

curl http://localhost:8090/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "input": "Tell me a three sentence bedtime story about Spice AI."
  }'

To use responses in spice chat, use the --responses flag.

Example:

spice chat --responses # Use the `/v1/responses` endpoint for all completions instead of `/v1/chat/completions`

Use OpenAI-hosted tools supported by Open AI's Responses API by specifying the openai_responses_tools parameter:

Example spicepod.yml:

models:
  - name: test
    from: openai:gpt-4.1
    params:
      openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY }
      tools: sql, list_datasets
      responses_api: enabled
      openai_responses_tools: web_search, code_interpreter #  'code_interpreter' or 'web_search'

These OpenAI-specific tools are only available from the /v1/responses endpoint. Any other tools specified via the tools parameter are available from both the /v1/chat/completions and /v1/responses endpoints.

Learn more in the OpenAI Model Provider documentation.

OpenAI Embeddings & Models Connectors Usage Tier: The OpenAI Embeddings and Models Connectors now supports specifying account usage tier for embeddings and model requests, improving the performance of generating text embeddings or calling models during dataset load and search by increasing concurrent requests.

Example spicepod.yml:

embeddings:
  - from: openai:text-embedding-3-small
    name: openai_embed
    params:
      openai_usage_tier: tier1

By setting the usage tier to the matching usage tier for your OpenAI account, the Embeddings and Models Connector will increase the maximum number of concurrent requests to match the specified tier.

Learn more in the OpenAI Model Provider documentation.

Contributors

New Contributors

@krinart made their first contribution in github.com/spiceai/spiceai/pull/6573

Breaking Changes

No breaking changes.

Cookbook Updates

Added OpenAI Responses API - Use OpenAI's Responses API with Spice
Added Live Orders Analytics with Apache Kafka Data Connector - Combine real-time data streaming from Kafka with other datasets
Added MongoDB Data Connector - Use MongoDB as a data source with Spice

The Spice Cookbook includes 77 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.6.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.6.0 image:

docker pull spiceai/spiceai:1.6.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is also now available in the AWS Marketplace!

What's Changed

Dependencies

DataFusion: Upgraded to v48
Rust: Upgraded from 1.86.0 to 1.87.0

Changelog

Support Streaming with Tool Calls (#6941) by @Advayp in #6941
Fix parameterized query planning in DataFusion (#6942) by @Jeadie in #6942
Update the UnableToLoadCredentials error with a pointer to docs (#6937) by @phillipleblanc in #6937
Fix spicecloud benchmark (#6935) by @krinart in #6935
[Debezium] Support for VariableScaleDecimal (#6934) by @krinart in #6934
Update to DF 48 (#6665) by @mach-kernel and @kczimm in #6665
Mark append-stream and CDC datasets as ready after first message (#6914) by @sgrebnov in #6914
Model2Vec embedding model support (#6846) by @mach-kernel in #6846
Update snapshot for S3 vector search test (#6920) by @Jeadie in #6920
remove [] from queryset in spicepod path for CI (#6919) by @Jeadie in #6919
Remove verbose tracing (#6915) by @Jeadie in #6915
Refactor how models supporting the Responses API are loaded (#6912) by @Advayp in #6912
Write tests for truncate formatting in arrow_tools and fix bug. (#6900) by @Jeadie in #6900
Support using the Responses API from spice chat (#6894) by @Advayp in #6894
Include GPT-5 into Text-To-SQL and Financebench benchmarks (#6907) by @sgrebnov in #6907
Better error message when credentials aren't loaded for S3 Vectors (#6910) by @phillipleblanc in #6910
Add tracing and system prompt support for the Responses API (#6893) by @Advayp in #6893
Constraint violation check is improved to control behavior when violations occur within a batch (#6897) by @phillipleblanc in #6897
fix: Multi-column text search with v1/search (#6905) by @peasee in #6905
fix: Correctly project text search primary keys to underlying projection (#6904) by @peasee in #6904
fix: Update benchmark snapshots (#6901) by @app/github-actions in #6901
In S3vector, do not pushdown on non-filterable columns (#6884) by @Jeadie in #6884
Run E2E Test CI macOS build on bigger runners (#6896) by @phillipleblanc in #6896
Enable configuration of the Responses API for the Azure model provider (#6891) by @Advayp in #6891
fix: Update benchmark snapshots (#6888) by @app/github-actions in #6888
Update OpenAPI specification for /v1/responses (#6889) by @Advayp in #6889
Add test to ensure tools are injected correctly in the Responses API (#6886) by @Advayp in #6886
Enable embeddings for append streams (#6878) by @sgrebnov in #6878
Show correct limit for EXPLAIN plans in S3VectorsQueryExec (#6852) by @Jeadie in #6852
Responses API support for Azure Open AI (#6879) by @Advayp in #6879
fix: Update search test case structure (#6865) by @peasee in #6865
Fix mongodb benchmark (#6883) by @phillipleblanc in #6883
Support multiple column primary keys for S3 vectors. (#6775) by @Jeadie in #6775
Kafka Data Connector: persist consumer between restarts (#6870) by @sgrebnov in #6870
Fix newlines in errors added in recent PRs (#6877) by @phillipleblanc in #6877
Add override parameter to force support for the Responses API (#6871) by @Advayp in #6871
Don't use metadata columns in VectorScanTableProvider (#6854) by @Jeadie in #6854
Add non-streaming tool call support (hosted and Spice tools) via the Responses API (#6869) by @Advayp in #6869
Update error guideline to remove newlines + remove newlines from error messages. (#6866) by @phillipleblanc in #6866
Remove void acceleration engine + optional table behaviors (#6868) by @phillipleblanc in #6868
Kafka Data Connector basic support (#6856) by @sgrebnov in #6856
Federated+Accelerated TPCH Benchmarks for MongoDB (#6788) by @krinart in #6788
Pass embeddings calculated in compute_index to the acceleration (#6792) by @phillipleblanc in #6792
Add non-streaming and streaming support for OpenAI Responses API endpoint (#6830) by @Advayp in #6830
Use latest version of OpenAI crate to resolve issues with Service Tier deserialization (#6853) by @Advayp in #6853
Update openapi.json (#6799) by @app/github-actions in #6799
Improve management message (#6850) by @lukekim in #6850
fix: Include FTS search column if it is the PK (#6836) by @peasee in #6836
Refactor Health Checks (#6848) by @Advayp in #6848
Introduce a Responses trait and LLM registry for model providers that support the OpenAI Responses API (#6798) by @Advayp in #6798
fix: Update datafusion-table-providers to include constraints (#6837) by @peasee in #6837
Bump postcard from 1.1.2 to 1.1.3 (#6841) by @app/dependabot in #6841
Bump governor from 0.10.0 to 0.10.1 (#6835) by @app/dependabot in #6835
Bump ctor from 0.2.9 to 0.5.0 (#6827) by @app/dependabot in #6827
Bump azure_core from 0.26.0 to 0.27.0 (#6826) by @app/dependabot in #6826
Bump rstest from 0.25.0 to 0.26.1 (#6825) by @app/dependabot in #6825
Use latest commit in our fork of async-openai (#6829) by @Advayp in #6829
Bump rustls from 0.23.27 to 0.23.31 (#6824) by @app/dependabot in #6824
Bump async-trait from 0.1.88 to 0.1.89 (#6823) by @app/dependabot in #6823
Bump hyper from 1.6.0 to 1.7.0 (#6814) by @app/dependabot in #6814
Bump serde_json from 1.0.140 to 1.0.142 (#6812) by @app/dependabot in #6812
Add s3 vector test retrieving vectors (#6786) by @Jeadie in #6786
fix: Allow v1/search with only FTS (#6811) by @peasee in #6811
Bump tantivy from 0.24.1 to 0.24.2 (#6806) by @app/dependabot in #6806
Bump tokio-util from 0.7.15 to 0.7.16 (#6810) by @app/dependabot in #6810
fix: Improve FTS index primary key handling (#6809) by @peasee in #6809
Bump logos from 0.15.0 to 0.15.1 (#6808) by @app/dependabot in #6808
Bump hf-hub from 0.4.2 to 0.4.3 (#6807) by @app/dependabot in #6807
Bump odbc-api from 13.0.1 to 13.1.0 (#6803) by @app/dependabot in #6803
fix: Spice search CLI with FTS supports string or slice unmarshalling (#6805) by @peasee in #6805
Bump uuid from 1.17.0 to 1.18.0 (#6797) by @app/dependabot in #6797
Bump reqwest from 0.12.22 to 0.12.23 (#6796) by @app/dependabot in #6796
Bump anyhow from 1.0.98 to 1.0.99 (#6795) by @app/dependabot in #6795
Bump clap from 4.5.41 to 4.5.45 (#6794) by @app/dependabot in #6794
Respect default MAX_DECODING_MESSAGE_SIZE (100MB) in Flight API (#6802) by @sgrebnov in #6802
Fix compilation errors caused by upgrading async-openai (#6793) by @Advayp in #6793
Remove outdated vector search benchmark (replaced with testoperator) (#6791) by @sgrebnov in #6791
Handle errors in vector ingestion pipeline (#6782) by @phillipleblanc in #6782
fix: Explicitly error when chunking is defined for vector engines (#6787) by @peasee in #6787
Make VectorScanTableProvider and VectorQueryTableProvider support multi-column primary keys (#6757) by @Jeadie in #6757
Use megascience/megascience Q+A dataset for text search testing. (#6702) by @Jeadie in #6702
Flight REPL autocomplete (#6589) by @krinart in #6589
use ref: github.event.pull_request.head.sha in integration_models.yml (#6780) by @Jeadie in #6780
fix: Move search telemetry calls in UDTF to scan (#6778) by @peasee in #6778
Fix Hugging Face models and embeddings loading in Docker (#6777) by @ewgenius in #6777
feat: Migrate bedrock rate limiter (#6773) by @peasee in #6773
Run the PR checks on the DEV runners (#6769) by @phillipleblanc in #6769
feat: add OpenAI models rate controller (#6767) by @peasee in #6767
Implement MongoDB data connector (#6594) by @krinart in #6594
fix: Use head ref for concurrency group (#6770) by @peasee in #6770
fix: Run enforce pulls with spice on pull_request_target (#6768) by @peasee in #6768
feat: Add OpenAI Embeddings Rate Controller (#6764) by @peasee in #6764
Move AWS SDK credential bridge integration test to the existing AWS SDK integration test run (#6766) by @phillipleblanc in #6766
Use Spice specific errors instead of OpenAIError in embedding module (#6748) by @kczimm in #6748
Use context in Glue Catalog Provider (#6763) by @Advayp in #6763
pin cargo-deny to previous version (#6762) by @kczimm in #6762
Bump actions/download-artifact from 4 to 5 (#6720) by @app/dependabot in #6720
Upgrade dependabot dependencies (#6754) by @phillipleblanc in #6754
Set E2E Test CI models build to 90 minute timeout (#6756) by @phillipleblanc in #6756
chore: upgrade to Rust 1.87.0 (#6614) by @kczimm in #6614
feat: Add initial runtime-rate-limiter crate (#6753) by @peasee in #6753
feat: Add more embedding traces, add MiniLM MTEB spicepod (#6742) by @peasee in #6742
Update QA analytics for release (#6740) by @Advayp in #6740
Always use 'returnData: true' for s3 vector query index (#6741) by @Jeadie in #6741
feat: Add Embedding and Search anonymous telemetry (#6737) by @peasee in #6737
Add 1.5.2 to SECURITY.md (#6739) by @ewgenius in #6739
Combine the Iceberg and Object Store AWS SDK bridges into one crate (#6718) by @Advayp in #6718
Updates to v1.5.2 release notes (#6736) by @lukekim in #6736
Update end game template - move glue catalog to catalogs section (#6732) by @ewgenius in #6732
Update v1.5.2.md (#6735) by @kczimm in #6735
Add note about S3 Vectors workaround (#6734) by @phillipleblanc in #6734
feat: Avoid joining for VectorScanTableProvider if the index is sufficient (#6714) by @peasee in #6714
update changelog (#6729) by @kczimm in #6729
remove unneeded autogenerated s3 vector code (#6715) by @Jeadie in #6715
fix: Set S3 vectors default limit to 30, add more tracing (#6712) by @peasee in #6712
docs: Add Hadoop cookbook to endgame template (#6708) by @peasee in #6708
Fix testoperator append mode compilation error (#6706) by @phillipleblanc in #6706
test: Add VectorScanTableProvider snapshot tests (#6701) by @peasee in #6701
feat: Add Hadoop catalog-mode benchmark (#6684) by @peasee in #6684
Move shared AWS crates used in bridges to workspace (#6705) by @Advayp in #6705
Use installation id to group connections (#6703) by @Advayp in #6703
Add Guardrails for AWS bedrock models (#6692) by @Jeadie in #6692
Update bedrock keys for CI. (#6693) by @Jeadie in #6693
Update acknowledgements (#6690) by @app/github-actions in #6690
ROADMAP updates Aug 1, 2025 (#6667) by @lukekim in #6667
Add retry logic for OpenAI embeddings creation (#6656) by @sgrebnov in #6656
Make models E2E chat test more robust (#6657) by @sgrebnov in #6657
Update Search GH Workflow to use Test Operator (#6650) by @sgrebnov in #6650
Score and P95 latency calculation for MTEB Quora-based vector search tests in Test Operator (#6640) by @sgrebnov in #6640
Fix multiple query error being classified as an internal error (#6635) by @Advayp in #6635
Add Support for S3 Table Buckets (#6573) by krinart in #6573
set MISTRALRS_METAL_PRECOMPILE=0 for metal (#6652) by @kczimm in #6652
Vector search to push down udtf limit argument into logical sort plan (#6636) by @mach-kernel in #6636
docs: Update qa_analytics.csv (#6643) by @peasee in #6643
Update SECURITY.md (#6642) by @Jeadie in #6642
docs: Update qa_analytics.csv (#6641) by @peasee in #6641
Separate token usage (#6619) by @Advayp in #6619
Fix typo in release notes (#6634) by @Advayp in #6634
Add environment variable for org token (#6633) by @Advayp in #6633
CDC: Compute embeddings on ingest (#6612) by @mach-kernel in #6612
Add view name to view creation errors (#6611) by @lukekim in #6611
Add core logic for running MTEB Quora-based vector search tests in Test Operator (#6607) by @sgrebnov in #6607
Revert "Update generate-openapi.yml (#6584)" (#6620) by @Jeadie in #6620
Non-accelerated views should report as ready only after all dependent datasets are ready (#6617) by @sgrebnov in #6617

Spice v1.5.2 (Aug 11, 2025)

August 12, 2025 · 7 min read

Kevin Zimmerman

Principal Software Engineer at Spice AI

Announcing the release of Spice v1.5.2! 🛠️

Spice v1.5.2 introduces a new Amazon Bedrock Models Provider for converse API (Nova) compatible models, AWS Redshift support using the Postgres data connector, and Hadoop Catalog Support for Iceberg tables along with several bug fixes and improvements.

What's New in v1.5.2

Amazon Bedrock Models Provider: Adds a new Amazon Bedrock LLM Provider. Models compatible with the Converse API (Nova) are supported.

Amazon Bedrock provides access to a range of foundation models for generative AI. Spice supports using Bedrock-hosted models by specifying the bedrock prefix in the from field and configuring the required parameters.

Supported Model IDs:

amazon.nova-lite-v1:0
amazon.nova-micro-v1:0
amazon.nova-premier-v1:0
amazon.nova-pro-v1:0

Refer to the Amazon Bedrock documentation for details on available models and cross-region inference profiles.

Example Spicepod.yaml:

models:
  - from: bedrock:us.amazon.nova-lite-v1:0
    name: novash
    params:
      aws_region: us-east-1
      aws_access_key_id: ${ secrets:AWS_ACCESS_KEY_ID }
      aws_secret_access_key: ${ secrets:AWS_SECRET_ACCESS_KEY }
      bedrock_guardrail_identifier: arn:aws:bedrock:abcdefg012927:0123456789876:guardrail/hello
      bedrock_guardrail_version: DRAFT
      bedrock_trace: enabled
      bedrock_temperature: 42

For more information, see the Amazon Bedrock Documentation.

AWS Redshift Support for Postgres Data Connector: Spice now supports connecting to Amazon Redshift using the PostgreSQL data connector. Redshift is a columnar OLAP database compatible with PostgreSQL, allowing you to use the same connector and configuration parameters.

To connect to Redshift, use the format postgres:schema.table in your Spicepod and set the connection parameters to match your Redshift cluster settings.

Example Spicepod.yaml:

# Example datasets for Redshift TPCH tables
datasets:
  - from: postgres:public.customer
    name: customer
    params:
      pg_host: ${secrets:PG_HOST}
      pg_port: 5439
      pg_sslmode: prefer
      pg_db: dev
      pg_user: ${secrets:PG_USER}
      pg_pass: ${secrets:PG_PASS}
  - from: postgres:public.lineitem
    name: lineitem
    params:
      pg_host: ${secrets:PG_HOST}
      pg_port: 5439
      pg_sslmode: prefer
      pg_db: dev
      pg_user: ${secrets:PG_USER}
      pg_pass: ${secrets:PG_PASS}

Redshift types are mapped to PostgreSQL types. See the PostgreSQL connector documentation for details on supported types and configuration.

Hadoop Catalog Support for Iceberg: The Iceberg Data and Catalog connectors now support connecting to Hadoop catalogs on filesystem (file://) or S3 object storage (s3://, s3a://). This enables connecting to Iceberg catalogs without a separate catalog provider service.

Example Spicepod.yaml:

catalogs:
  - from: iceberg:file:///tmp/hadoop_warehouse/
    name: local_hadoop
  - from: iceberg:s3://my-bucket/hadoop_warehouse/
    name: s3_hadoop

  # Example datasets
  - from: iceberg:file:///data/hadoop_warehouse/test/my_table_1
    name: local_hadoop
  - from: iceberg:s3://my-bucket/hadoop_warehouse/test/my_table_2
    name: s3_hadoop

For more details, see the Iceberg Data Connector documentation and the Iceberg Catalog Connector documentation.

Parquet Reader: Optional Parquet Page Index: Fixed an issue where the Parquet reader, using arrow-rs and DataFusion, errored on files missing page indexes, despite the Parquet spec allowing optional indexes. The Spice team contributed optional page index support to arrow-rs (PR #6) and configurable handling in DataFusion (PR #93). A new runtime parameter, parquet_page_index, makes Parquet Page Indexes configurable in Spice:

runtime:
  params:
    parquet_page_index: required # Options: required, skip, auto

required: (Default) Errors if page indexes are absent.
skip: Ignores page indexes, potentially reducing query performance.
auto: Uses page indexes if available; skips otherwise.

This improves compatibility and query flexibility for Parquet datasets.

Contributors

Breaking Changes

Amazon S3 Vectors Vector Engine: Amazon S3 Vectors is currently a preview AWS service. A recent update to the Amazon S3 Vectors service API introduced a breaking change that affects the integration when projecting (selecting) the embedding column. This results in the following error:

Json error: whilst decoding field 'data': expected [ got nullReceived only partial JSON payload from QueryVectors

The issue is expected to be resolved in the next release of Spice. A current workaround is to limit queries to non-embedding columns.

i.e. instead of:

SELECT url, title, scored, body_embedding
FROM vector_search(pulls, 'bugs in DuckDB', 4)
WHERE state = 'OPEN'
ORDER BY score DESC
LIMIT 4;

Remove the *_embedding column from the projection. E.g.

SELECT url, title, scored
FROM vector_search(pulls, 'bugs in DuckDB', 4)
WHERE state = 'OPEN'
ORDER BY score DESC
LIMIT 4;

This issue and workaround also applies to SELECT * FROM vector_search(..). E.g.

SELECT *
FROM vector_search(pulls, 'bugs in DuckDB', 4)
WHERE state = 'OPEN'
ORDER BY score DESC
LIMIT 4;

Cookbook Updates

Added Amazon Redshift Support to the Postgres Data Connector cookbook: Connect to tables in Amazon Redshift.

The Spice Cookbook includes 75 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.5.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.5.2 image:

docker pull spiceai/spiceai:1.5.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is also now available in the AWS Marketplace!

What's Changed

Dependencies

No major dependency updates.

Changelog

fixes for databricks OpenAI compatibility (#6629) by @Jeadie in #6629
Update spicepod.schema.json (#6632) by @app/github-actions in #6632
Remove 'stream_options' from databricks LLMs (#6637) by @Jeadie in #6637
Move retry and rate limiting logic for Amazon bedrock out of embeddings. (#6626) by @Jeadie in #6626
Disable Metal precomplation in integration_llms.yml (#6649) by @Jeadie in #6649
fix: Hadoop integration test (#6660) by @peasee in #6660
feat: Add Hadoop Catalog Data Component (#6658) by @peasee in #6658
update datafusion-table-providers to latest spiceai tag (#6661) by @mach-kernel in #6661
feat: Add Hadoop Catalog connectors for Iceberg (#6659) by @peasee in #6659
Make FullTextSearchExec robust to RecordBatch column ordering. (#6675) by @Jeadie in #6675
Make 'runtime-object-store' crate (#6674) by @Jeadie in #6674
fix: Support include for Iceberg (#6663) by @peasee in #6663
feat: Add Hadoop TPCH benchmark (#6678) by @peasee in #6678
feat: Add Hadoop metadata_path parameter (#6680) by @peasee in #6680
fix: Automatically infer Hadoop warehouse scheme (#6681) by @peasee in #6681
Amazon Bedrock, specifically Nova models (#6673) by @Jeadie in [#6673](https://github.com/spiceai/spiceai/pull/6673
fix perplexity_auth_token parameters for web_search (#6685) by @Jeadie in #6685
Fix AWS Auth issue (#6699) by @Advayp in #6699
Limit Concurrent Requests for GitHub (#6672) by @Advayp in #6672
Add runtime parameter to enable more permissive parquet reading when page indexes are missing (#6716) by @phillipleblanc in #6716
Improve Flight REPL error messages (#6696) by @lukekim in #6696
Fixes from search tests (#6710) by @Jeadie in #6710

Spice v1.5.1 (July 28, 2025)

July 29, 2025 · 5 min read

Jack Eadie

Token Plumber at Spice AI

Announcing the release of Spice v1.5.1! 🔑

Spice v1.5.1 expands the GitHub data connector to include pull-request comments, adds a configurable rate limiting for AWS Bedrock embedding models, expands partition pruning with inequality operators, and adds client-supplied cache keys for granular caching control in the HTTP and Arrow Flight SQL APIs.

What's New in v1.5.1

GitHub Data Connector Pull Request Comments: Configure GitHub pulls datasets to include comments.

Example Spicepod.yaml:

datasets:
  - from: github:github.com/spiceai/spiceai/pulls
    name: spiceai.pulls
    params:
      github_include_comments: all # 'review', 'discussion', or 'none'. Defaults to 'none'.
      github_max_comments_fetched: '25' # Defaults to 100
      # ...

For details, see the GitHub Data Connector documentation.

AWS Bedrock Embedding Models Invocation Control: Improved rate limiting control for AWS Bedrock embedding models with max_concurrent_invocations configuration.

embeddings:
  - from: bedrock:cohere.embed-english-v3
    name: cohere-embeddings
    params:
      max_concurrent_invocations: '41'
      # ...

For details, see the AWS Bedrock Embeddings Model Provider documentation.

Improved Query Partitioning: Expanded partition pruning support with additional inequality operators (e.g. >, >=, <, <=).

For details, see the Query Partitioning documentation.

Client-Supplied Cache Keys: Support for a new Spice-Cache-Key header/metadata-key in the HTTP and Arrow Flight SQL query APIs to for fine-grained client-side caching control.

Example HTTP API usage:

$ curl -vvS -XPOST http://localhost:8090/v1/sql \
-H"spice-cache-key: 1851400_20170216_north_america" \
-d "select * from scihub_journals_accessed
    where user_id = '1851400'
      and date_trunc('DAY', timestamp) = '2017-02-16'
      and city = 'New York';"

Example Response:

< HTTP/1.1 200 OK
< content-type: application/json
< x-cache: Hit from spiceai
< results-cache-status: HIT
< vary: Spice-Cache-Key
< vary: origin, access-control-request-method, access-control-request-headers
< content-length: 604
< date: Wed, 23 Jul 2025 20:26:12 GMT
<
[{
"timestamp": "2017-02-16 09:55:06",
"doi": "10.1155/2012/650929",
"ip_identifier": 1000856,
"user_id": 1851400,
"country": "United States",
"city": "New York",
"longitude": 40.7830603,
"latitude": -73.9712488
},
...
]

For details, see the Cache Control documentation.

Contributors

New Contributors

@varunguleriaCodes made their first contribution in github.com/spiceai/spiceai/pull/6383

Breaking Changes

Cookbook Updates

No new recipes added in this release.

The Spice Cookbook includes 74 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.5.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.5.1 image:

docker pull spiceai/spiceai:1.5.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

No major dependency updates.

Changelog

Fix refresh via Api when dataset is already accelerated and no refresh interval is set by @sgrebnov in #6549
Add support for custom GraphQL unnesting behavior by @Advayp in #6540
Regex Update to disallow hyphens dataset names by @varunguleriaCodes in #6383
Enforce max limit on comments fetched per PR by @Advayp in #6580
Fix accelerated refresh issue by @Advayp in #6590
Enable configurations of max invocations for Bedrock models by @Advayp in #6592
Client-supplied cache keys (Spice-Cache-Key) by @mach-kernel in #6579
Improved partition pruning by @kczimm in #6582
Fix retention filter when both retention_sql and period are set by @sgrebnov in #6595
Initial support for PR comments by @Advayp in #6569
chore: Update croner by @peasee in #6547
fix databricks streaming for Claude model by @peasee in #6601
Remove FullTextUDTFAnalyzerRule and move FTS code into search crate by @jeadie in #6596
Remove download of legacy sentence transformers config by @jeadie in #6605
re-add snapshot tests by @jeadie
Embedding column config to support client-specified vector sizes by @mach-kernel in #6610
Fix mismatch in columns for the GitHub PR table type by @Advayp in #6616
bump version to 1.5.1 by @phillipleblanc
fix issues with cherry-picking by @jeadie
Add integration tests for GitHub PRs with comments by @Advayp in #6581
Add view name to view creation errors by @lukekim in #6611
CDC: Compute embeddings on ingest by @mach-kernel in #6612

Spice v1.5.0 (July 21, 2025)

July 22, 2025 · 14 min read

Evgenii Khramkov

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.5.0! 🔍

Spice v1.5.0 brings major upgrades to search and retrieval. It introduces native support for Amazon S3 Vectors, enabling petabyte scale vector search directly from S3 vector buckets, alongside SQL-integrated vector and tantivy-powered full-text search, partitioning for DuckDB acceleration, and automated refreshes for search indexes and views. It includes the AWS Bedrock Embeddings Model Provider, the Oracle Database connector, and the now-stable Spice.ai Cloud Data Connector, and the upgrade to DuckDB v1.3.2.

What's New in v1.5.0

Amazon S3 Vectors Support: Spice.ai now integrates with Amazon S3 Vectors, launched in public preview on July 15, 2025, enabling vector-native object storage with built-in indexing and querying. This integration supports semantic search, recommendation systems, and retrieval-augmented generation (RAG) at petabyte scale with S3’s durability and elasticity. Spice.ai manages the vector lifecycle—ingesting data, creating embeddings with models like Amazon Titan or Cohere via AWS Bedrock, or others available on HuggingFace, and storing it in S3 Vector buckets.

Spice integration with Amazon S3 Vectors

Example Spicepod.yml configuration for S3 Vectors:

datasets:
  - from: s3://my_data_bucket/data/
    name: my_vectors
    params:
      file_format: parquet
    acceleration:
      enabled: true
    vectors:
      engine: s3_vectors
      params:
        s3_vectors_aws_region: us-east-2
        s3_vectors_bucket: my-s3-vectors-bucket
    columns:
      - name: content
        embeddings:
          - from: bedrock_titan
            row_id:
              - id

Example SQL query using S3 Vectors:

SELECT *
FROM vector_search(my_vectors, 'Cricket bats', 10)
WHERE price < 100
ORDER BY score

For more details, refer to the S3 Vectors Documentation.

SQL-integrated Search: Vector and BM25-scored full-text search capabilities are now natively available in SQL queries, extending the power of the POST v1/search endpoint to all SQL workflows.

Example Vector-Similarity-Search (VSS) using the vector_search UDTF on the table reviews for the search term "Cricket bats":

SELECT review_id, review_text, review_date, score
FROM vector_search(reviews, "Cricket bats")
WHERE country_code="AUS"
LIMIT 3

Example Full-Text-Search (FTS) using the text_search UDTF on the table reviews for the search term "Cricket bats":

SELECT review_id, review_text, review_date, score
FROM text_search(reviews, "Cricket bats")
LIMIT 3

DuckDB v1.3.2 Upgrade: Upgraded DuckDB engine from v1.1.3 to v1.3.2. Key improvements include support for adding primary keys to existing tables, resolution of over-eager unique constraint checking for smoother inserts, and 13% reduced runtime on TPC-H SF100 queries through extensive optimizer refinements. The v1.2.x release of DuckDB was skipped due to a regression in indexes.

Read the DuckDB v1.2.0 announcement.
Read the DuckDB v1.3.0 announcement.

Partitioned Acceleration: DuckDB file-based accelerations now support partition_by expressions, enabling queries to scale to large datasets through automatic data partitioning and query predicate pruning. New UDFs, bucket and truncate, simplify partition logic.

New UDFs useful for partition_by expressions:

bucket(num_buckets, col): Partitions a column into a specified number of buckets based on a hash of the column value.
truncate(width, col): Truncates a column to a specified width, aligning values to the nearest lower multiple (e.g., truncate(10, 101) = 100).

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/some_large_table/
    name: my_table
    params:
      file_format: parquet
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      partition_by: bucket(100, account_id) # Partition account_id into 100 buckets

Full-Text-Search (FTS) Index Refresh: Accelerated datasets with search indexes maintain up-to-date results with configurable refresh intervals.

Example refreshing search indexes on body every 10 seconds:

datasets:
  - from: github:github.com/spiceai/docs/pulls
    name: spiceai.doc.pulls
    params:
      github_token: ${secrets:GITHUB_TOKEN}
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_check_interval: 10s
    columns:
      - name: body
        full_text_search:
          enabled: true
          row_id:
            - id

Scheduled View Refresh: Accelerated Views now support cron-based refresh schedules using refresh_cron, automating updates for accelerated data.

Example Spicepod.yml configuration:

views:
  - name: my_view
    sql: SELECT 1
    acceleration:
      enabled: true
      refresh_cron: '0 * * * *' # Every hour

For more details, refer to Scheduled Refreshes.

Multi-column Vector Search: For datasets configured with embeddings on more than one column, POST v1/search and similarity_search perform parallel vector search on each column, aggregating results using reciprocal rank fusion.

Example Spicepod.yml for multi-column search:

datasets:
  - from: github:github.com/apache/datafusion/issues
    name: datafusion.issues
    params:
      github_token: ${secrets:GITHUB_TOKEN}
    columns:
      - name: title
        embeddings:
          - from: hf_minilm
      - name: body
        embeddings:
          - from: openai_embeddings

AWS Bedrock Embeddings Model Provider: Added support for AWS Bedrock embedding models, including Amazon Titan Text Embeddings and Cohere Text Embeddings.

Example Spicepod.yml:

embeddings:
  - from: bedrock:cohere.embed-english-v3
    name: cohere-embeddings
    params:
      aws_region: us-east-1
      input_type: search_document
      truncate: END
  - from: bedrock:amazon.titan-embed-text-v2:0
    name: titan-embeddings
    params:
      aws_region: us-east-1
      dimensions: '256'

For more details, refer to the AWS Bedrock Embedding Models Documentation.

Oracle Data Connector: Use from: oracle: to access and accelerate data stored in Oracle databases, deployed on-premises or in the cloud.

Example Spicepod.yml:

datasets:
  - from: oracle:"SH"."PRODUCTS"
    name: products
    params:
      oracle_host: 127.0.0.1
      oracle_username: scott
      oracle_password: tiger

See the Oracle Data Connector documentation.

GitHub Data Connector: The GitHub data connector supports query and acceleration of members, the users of an organization.

Example Spicepod.yml configuration:

datasets:
  - from: github:github.com/spiceai/members # General format: github.com/[org-name]/members
    name: spiceai.members
    params:
      # With GitHub Apps (recommended)
      github_client_id: ${secrets:GITHUB_SPICEHQ_CLIENT_ID}
      github_private_key: ${secrets:GITHUB_SPICEHQ_PRIVATE_KEY}
      github_installation_id: ${secrets:GITHUB_SPICEHQ_INSTALLATION_ID}
      # With GitHub Tokens
      # github_token: ${secrets:GITHUB_TOKEN}

See the GitHub Data Connector Documentation

Spice.ai Cloud Data Connector: Graduated to Stable.

spice-rs SDK Release: The Spice Rust SDK has updated to v3.0.0. This release includes optimizations for the Spice client API, adds robust query retries, and custom metadata configurations for spice queries.

Contributors

Breaking Changes

Search HTTP API Response: POST v1/search response payload has changed. See the new API documentation for details.
Model Provider Parameter Prefixes: Model Provider parameters use provider-specific prefixes instead of openai_ prefixes (e.g., hf_temperature for HuggingFace, anthropic_max_completion_tokens for Anthropic, perplexity_tool_choice for Perplexity). The openai_ prefix remains supported for backward compatibility but is deprecated and will be removed in a future release.

Cookbook Updates

Added Oracle Data Connector cookbook: Connect to tables in Oracle databases.
Added Hashed Partitioning with DuckDB cookbook: Accelerate data on large datasets by partitioning data into a fixed number of buckets.

The Spice Cookbook now includes 72 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.5.0, download and install the specific binary from github.com/spiceai/spiceai/releases/tag/v1.5.0 or pull the v1.5.0 Docker image (spiceai/spiceai:1.5.0).

What's Changed

Dependencies

delta_kernel: Upgraded to v0.12.1
DuckDB: Upgraded from v1.1.3 to v1.3.2
iceberg-rust: Upgraded from v0.4.0 to v0.5.1

Changelog

fix: openai model endpoint (#6394) by @Sevenannn in #6394
Enable configuring otel endpoint from spice run (#6360) by @Advayp in #6360
Enable Oracle connector in default build configuration (#6395) by @sgrebnov in #6395
fix llm integraion test (#6398) by @Sevenannn in #6398
Promote spice cloud connector to stable quality (#6221) by @Sevenannn in #6221
v1.5.0-rc.1 release notes (#6397) by @lukekim in #6397
Fix model nsql integration tests (#6365) by @Sevenannn in #6365
Fix incorrect UDTF name and SQL query (#6404) by @lukekim in #6404
Update v1.5.0-rc.1.md (#6407) by @sgrebnov in #6407
Improve error messages (#6405) by @lukekim in #6405
build(deps): bump Jimver/cuda-toolkit from 0.2.25 to 0.2.26 (#6388) by @app/dependabot in #6388
Upgrade dependabot dependencies (#6411) by @phillipleblanc in #6411
Fix projection pushdown issues for document based file connector (#6362) by @Advayp in #6362
Add a PartitionedDuckDB Accelerator (#6338) by @kczimm in #6338
Use vector_search() UDTF in HTTP APIs (#6417) by @Jeadie in #6417
add supported types (#6409) by @kczimm in #6409
Enable session time zone override for MySQL (#6426) by @sgrebnov in #6426
Acceleration-like indexing for full text search indexes. (#6382) by @Jeadie in #6382
Provide error message when partition by expression changes (#6415) by @kczimm in #6415
Add support for Oracle Autonomous Database connections (Oracle Cloud) (#6421) by @sgrebnov in #6421
prune partitions for exact and in list with and without UDFs (#6423) by @kczimm in #6423
Fixes and reenable FTS tests (#6431) by @Jeadie in #6431
Upgrade DuckDB to 1.3.2 (#6434) by @phillipleblanc in #6434
Fix issue in limit clause for the Github Data connector (#6443) by @Advayp in #6443
Upgrade iceberg-rust to 0.5.1 (#6446) by @phillipleblanc in #6446
v1.5.0-rc.2 release notes (#6440) by @lukekim in #6440
Oracle: add automated TPC-H SF1 benchmark tests (#6449) by @sgrebnov in #6449
fix: Update benchmark snapshots (#6455) by @app/github-actions in #6455
Preserve ArrowError in arrow_tools::record_batch (#6454) by @mach-kernel in #6454
fix: Update benchmark snapshots (#6465) by @app/github-actions in #6465
Add option to preinstall Oracle ODPI-C library in Docker image (#6466) by @sgrebnov in #6466
Include Oracle connector (federated mode) in automated benchmarks (#6467) by @sgrebnov in #6467
Update crates/llms/src/bedrock/embed/mod.rs by @lukekim in #6468
v1.5.0-rc.3 release notes (#6474) by @lukekim in #6474
Add integration tests for S3 Vectors filters pushdown (#6469) by @sgrebnov in #6469
check for indexedtableprovider when finding tables to search on (#6478) by @Jeadie in #6478
Parse fully qualified table names in UDTFs (#6461) by @Jeadie in #6461
Add integration test for S3 Vectors to cover data update (overwrite) (#6480) by @sgrebnov in #6480
Add 'Run all tests' option for models tests and enable Bedrock tests (#6481) by @sgrebnov in #6481
Add support for a members table type for the GitHub Data Connector (#6464) by @Advayp in #6464
S3 vector data cannot be null (#6483) by @Jeadie in #6483
Don't infer FixedSizeList size during indexing vectors. (#6487) by @Jeadie in #6487
Add support for retention_sql acceleration param (#6488) by @sgrebnov in #6488
Make dataset refresh progress tracing less verbose (#6489) by @sgrebnov in #6489
Use RwLock on tantivy index in FullTextDatabaseIndex for update concurrency (#6490) by @Jeadie in #6490
Add tests for dataset retention logic and refactor retention code (#6495) by @sgrebnov in #6495
Upgade dependabot dependencies (#6497) by @phillipleblanc in #6497
Add periodic tracing of data loading progress during dataset refresh (#6499) by @sgrebnov in #6499
Promote Oracle Data Connector to Alpha (#6503) by @sgrebnov in #6503
Use AWS SDK to provide credentials for Iceberg connectors (#6498) by @phillipleblanc in #6498
Add integration tests for partitioning (#6463) by @kczimm in #6463
Use top-level table in full-text search JOIN ON (#6491) by @Jeadie in #6491
Use accelerated table in vector_search JOIN operations when appropriate (#6516) by @Jeadie in #6516
Fix 'additional_column' for quoted columns (fix for qualified columns broke it) (#6512) by @Jeadie in #6512
Also use AWS SDK for inferring credentials for S3/Delta/Databricks Delta data connectors (#6504) by @phillipleblanc in #6504
Add per-dataset availability monitor configuration (#6482) by @phillipleblanc in #6482
Suppress the warning from the AWS SDK if it can't load credentials (#6533) by @phillipleblanc in #6533
Change default value of check_availability from default to auto (#6534) by @lukekim in #6534
README.md improvements for v1.5.0 (#6539) by @lukekim in #6539
Temporary disable s3_vectors_basic (#6537) by @sgrebnov in #6537
Ensure binder errors show before query and other (#6374) by @suhuruli in #6374
Update spiceai/duckdb-rs -> DuckDB 1.3.2 + index fix (#6496) by @mach-kernel in #6496
Update table-providers to latest version with DuckDB fixes (#6535) by @phillipleblanc in #6535
S3: default to public access if no auth is provided (#6532) by @sgrebnov in #6532

Spice v1.4.0 (June 18, 2025)

June 19, 2025 · 19 min read

William Croxson

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.4.0! ⚡

This release upgrades DataFusion to v47 and Arrow to v55 for faster queries, more efficient Parquet/CSV handling, and improved reliability. It introduces the AWS Glue Catalog and Data Connectors for native access to Glue-managed data on S3, and adds support for Databricks U2M OAuth for secure Databricks user authentication.

New Cron-based dataset refreshes and worker schedules enable automated task management, while dataset and search results caching improvements further optimizes query, search, and RAG performance.

What's New in v1.4.0

DataFusion v47 Highlights

Spice.ai is built on the DataFusion query engine. The v47 release brings:

Performance Improvements 🚀: This release delivers major query speedups through specialized GroupsAccumulator implementations for first_value, last_value, and min/max on Duration types, eliminating unnecessary sorting and computation. TopK operations are now up to 10x faster thanks to early exit optimizations, while sort performance is further enhanced by reusing row converters, removing redundant clones, and optimizing sort-preserving merge streams. Logical operations benefit from short-circuit evaluation for AND/OR, reducing overhead, and additional enhancements address high latency from sequential metadata fetching, improve int/string comparison efficiency, and simplify logical expressions for better execution.

Bug Fixes & Compatibility Improvements 🛠️: The release addresses issues with external sort, aggregation, and window functions, improves handling of NULL values and type casting in arrays and binary operations, and corrects problems with complex joins and nested window expressions. It also addresses SQL unparsing for subqueries, aliases, and UNION BY NAME.

See the Apache DataFusion 47.0.0 Changelog for details.

Arrow v55 Highlights

Arrow v55 delivers faster Parquet gzip compression, improved array concatenation, and better support for large files (4GB+) and modular encryption. Parquet metadata reads are now more efficient, with support for range requests and enhanced compatibility for INT96 timestamps and timezones. CSV parsing is more robust, with clearer error messages. These updates boost performance, compatibility, and reliability.

See the Arrow 55.0.0 Changelog and Arrow 55.1.0 Changelog for details.

Runtime Highlights

Search Result Caching: Spice now supports runtime caching for search results, improving performance for subsequent searches and chat completion requests that use the document_similarity LLM tool. Caching is configurable with options like maximum size, item TTL, eviction policy, and hashing algorithm.

Example spicepod.yml configuration:

runtime:
  caching:
    search_results:
      enabled: true
      max_size: 128mb
      item_ttl: 5s
      eviction_policy: lru
      hashing_algorithm: siphash

For more information, refer to the Caching documentation.

AWS Glue Catalog Connector Alpha: Connect to AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV tables in S3.

Example spicepod.yml configuration:

catalogs:
  - from: glue
    name: my_glue_catalog
    params:
      glue_key: <your-access-key-id>
      glue_secret: <your-secret-access-key>
      glue_region: <your-region>
    include:
      - 'testdb.hive_*'
      - 'testdb.iceberg_*'

sql> show tables;
+-----------------+--------------+-------------------+------------+
| table_catalog   | table_schema | table_name        | table_type |
+-----------------+--------------+-------------------+------------+
| my_glue_catalog | testdb       | hive_table_001    | BASE TABLE |
| my_glue_catalog | testdb       | iceberg_table_001 | BASE TABLE |
| spice           | runtime      | task_history      | BASE TABLE |
+-----------------+--------------+-------------------+------------+

For more information, refer to the Glue Catalog Connector documentation.

AWS Glue Data Connector Alpha: Connect to specific tables in AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV in S3.

Example spicepod.yml configuration:

datasets:
  - from: glue:my_database.my_table
    name: my_table
    params:
      glue_auth: key
      glue_region: us-east-1
      glue_key: ${secrets:AWS_ACCESS_KEY_ID}
      glue_secret: ${secrets:AWS_SECRET_ACCESS_KEY}

For more information, refer to the Glue Data Connector documentation.

Databricks U2M OAuth: Spice now supports User-to-Machine (U2M) authentication for Databricks when called with a compatible client, such as the Spice Cloud Platform.

datasets:
  - from: databricks:spiceai_sandbox.default.messages
    name: messages
    params:
      databricks_endpoint: ${secrets:DATABRICKS_ENDPOINT}
      databricks_cluster_id: ${secrets:DATABRICKS_CLUSTER_ID}
      databricks_client_id: ${secrets:DATABRICKS_CLIENT_ID}

Dataset Refresh Schedules: Accelerated datasets now support a refresh_cron parameter, automatically refreshing the dataset on a defined cron schedule. Cron scheduled refreshes respect the global dataset_refresh_parallelism parameter.

Example spicepod.yml configuration:

datasets:
  - name: my_dataset
    from: s3://my-bucket/my_file.parquet
    acceleration:
      refresh_cron: 0 0 * * * # Daily refresh at midnight

For more information, refer to the Dataset Refresh Schedules documentation.

Worker Execution Schedules: Workers now support a cron parameter and will execute an LLM-prompt or SQL query automatically on the defined cron schedule, in conjunction with a provided params.prompt.

Example spicepod.yml configuration:

workers:
  - name: email_reporter
    models:
      - from: gpt-4o
    params:
      prompt: 'Inspect the latest emails, and generate a summary report for them. Post the summary report to the connected Teams channel'
    cron: 0 2 * * * # Daily at 2am

For more information, refer to the Worker Execution Schedules documentation.

SQL Worker Actions: Spice now supports workers with sql actions for automated SQL query execution on a cron schedule:

workers:
  - name: my_worker
    cron: 0 * * * *
    sql: 'SELECT * FROM lineitem'

For more information, refer to the Workers with a SQL action documentation;

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

Added Glue Catalog Connector and Data Connector cookbooks: Connect to tables and databases in the AWS Glue Data catalog.
Added Cron-based Dataset Refresh: Refresh datasets on defined schedules.

The Spice Cookbook now includes 70 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.4.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.4.0 image:

docker pull spiceai/spiceai:1.4.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

DataFusion: Upgraded to v47
arrow-rs: Upgraded to v55.1.0
delta_kernel: Upgraded to v0.11.0

Changelog

Update trunk to 1.4.0-unstable (#5878) by @phillipleblanc in #5878
Update openapi.json (#5885) by @app/github-actions in #5885
feat: Testoperator reports benchmark failure summary (#5889) by @peasee in #5889
fix: Publish binaries to dev when platform option is all (#5905) by @peasee in #5905
feat: Print dispatch current test count of total (#5906) by @peasee in #5906
Include multiple duckdb files acceleration scenarios into testoperator dispatch (#5913) by @sgrebnov in #5913
feat: Support building testoperator on dev (#5915) by @peasee in #5915
Update spicepod.schema.json (#5927) by @app/github-actions in #5927
Update ROADMAP & SECURITY for 1.3.0 (#5926) by @phillipleblanc in #5926
docs: Update qa_analytics.csv (#5928) by @peasee in #5928
fix: Properly publish binaries to dev on push (#5931) by @peasee in #5931
Load request context extensions on every flight incoming call (#5916) by @ewgenius in #5916
Fix deferred loading for datasets with embeddings (#5932) by @ewgenius in #5932
Schedule AI benchmarks to run every Mon and Thu evening PST (#5940) by @sgrebnov in #5940
Fix explain plan snapshots for TPCDS queries Q36, Q70 & Q86 not being deterministic after DF 46 upgrade (#5942) by @phillipleblanc in #5942
chore: Upgrade to Rust 1.86 (#5945) by @peasee in #5945
Standardise HTTP settings across CLI (#5769) by @Jeadie in #5769
Fix deferred flag for Databricks SQL warehouse mode (#5958) by @ewgenius in #5958
Add deferred catalog loading (#5950) by @ewgenius in #5950
Refactor deferred_load using ComponentInitialization enum for better clarity (#5961) by @ewgenius in #5961
Post-release housekeeping (#5964) by @phillipleblanc in #5964
add LTO for release builds (#5709) by @kczimm in #5709
Fix dependabot/192 (#5976) by @Jeadie in #5976
Fix Test-to-SQL benchmark scheduled run (#5977) by @sgrebnov in #5977
Fix JSON to ScalarValue type conversion to match DataFusion behavior (#5979) by @sgrebnov in #5979
Add v1.3.1 release notes (#5978) by @lukekim in #5978
Regenerate nightly build workflow (#5995) by @ewgenius in #5995
Fix DataFusion dependency loading in Databricks request context extension (#5987) by @ewgenius in #5987
Update spicepod.schema.json (#6000) by @app/github-actions in #6000
feat: Run MySQL SF100 on dev runners (#5986) by @peasee in #5986
fix: Remove caching RwLock (#6001) by @peasee in #6001
1.3.1 Post-release housekeeping (#6002) by @phillipleblanc in #6002
feat: Add initial scheduler crate (#5923) by @peasee in #5923
fix flight request context scope (#6004) by @ewgenius in #6004
fix: Ensure snapshots on different scale factors are retained (#6009) by @peasee in #6009
fix: Allow dev runners in dispatch files (#6011) by @peasee in #6011
refactor: Deprecate results_cache for caching.sql_results (#6008) by @peasee in #6008
Fix models benchmark results reporting (#6013) by @sgrebnov in #6013
fix: Run PR checks for tools/ changes (#6014) by @peasee in #6014
feat: Add a CronRequestChannel for scheduler (#6005) by @peasee in #6005
feat: Add refresh_cron acceleration parameter, start scheduler on table load (#6016) by @peasee in #6016
Update license check to allow dual license crates (#6021) by @sgrebnov in #6021
Initial worker concept (#5973) by @Jeadie in #5973
Don't fail if cargo-deny already installed (license check) (#6023) by @sgrebnov in #6023
Upgrade to DataFusion 47 and Arrow 55 (#5966) by @sgrebnov in #5966
Read Iceberg tables from Glue Catalog Connector (#5965) by @kczimm in #5965
Handle multiple highlights in v1/search UX (#5963) by @Jeadie in #5963
feat: Add cron scheduler configurations for workers (#6033) by @peasee in #6033
feat: Add search cache configuration and results wrapper (#6020) by @peasee in #6020
Fix GitHub Actions Ubuntu for more workflows (#6040) by @phillipleblanc in #6040
Fix Actions for testoperator dispatch manual (#6042) by @phillipleblanc in #6042
refactor: Remove worker type (#6039) by @peasee in #6039
feat: Support cron dataset refreshes (#6037) by @peasee in #6037
Upgrade datafusion-federation to 0.4.2 (#6022) by @phillipleblanc in #6022
Define SearchPipeline and use in runtime/vector_search.rs. (#6044) by @Jeadie in #6044
fix: Scheduler test when scheduler is running (#6051) by @peasee in #6051
doc: Spice Cloud Connector Limitation (#6035) by @Sevenannn in #6035
Add support for on_conflict:upsert for Arrow MemTable (#6059) by @sgrebnov in #6059
Enhance Arrow Flight DoPut operation tracing (#6053) by @sgrebnov in #6053
Update openapi.json (#6032) by @app/github-actions in #6032
Add tools enabled to MCP server capabilities (#6060) by @Jeadie in #6060
Upgrade to delta_kernel 0.11 (#6045) by @phillipleblanc in #6045
refactor: Replace refresh oneshot with notify (#6050) by @peasee in #6050
Enable Upsert OnConflictBehavior for runtime.task_history table (#6068) by @sgrebnov in #6068
feat: Add a workers integration test (#6069) by @peasee in #6069
Fix DuckDB acceleration ORDER BY rand() and ORDER BY NULL (#6071) by @phillipleblanc in #6071
Update Models Benchmarks to report unsuccessful evals as errors (#6070) by @sgrebnov in #6070
Revert: fix: Use HTTPS ubuntu sources (#6082) by @Sevenannn in #6082
Add initial support for Spice Cloud Platform management (#6089) by @sgrebnov in #6089
Run spiceai cloud connector TPC tests using spice dev apps (#6049) by @Sevenannn in #6049
feat: Add SQL worker action (#6093) by @peasee in #6093
Post-release housekeeping (#6097) by @phillipleblanc in #6097
Fix search bench (#6091) by @Jeadie in #6091
fix: Update benchmark snapshots (#6094) by @app/github-actions in #6094
fix: Update benchmark snapshots (#6095) by @app/github-actions in #6095
Glue catalog connector for hive style parquet (#6054) by @kczimm in #6054
Update openapi.json (#6100) by @app/github-actions in #6100
Improve Flight Client DoPut / Publish error handling (#6105) by @sgrebnov in #6105
Define PostApplyCandidateGeneration to handle all filters & projections. (#6096) by @Jeadie in #6096
refactor: Update the tracing task names for scheduled tasks (#6101) by @peasee in #6101
task: Switch GH runners in PR and testoperator (#6052) by @peasee in #6052
feat: Connect search caching for HTTP and tools (#6108) by @peasee in #6108
test: Add multi-dataset cron test (#6102) by @peasee in #6102
Sanitize the ListingTableURL (#6110) by @phillipleblanc in #6110
Avoid partial writes by FlightTableWriter (#6104) by @sgrebnov in #6104
fix: Update the TPCDS postgres acceleration indexes (#6111) by @peasee in #6111
Make Glue Catalog refreshable (#6103) by @kczimm in #6103
Refactor Glue catalog to use a new Glue data connector (#6125) by @kczimm in #6125
Emit retry error on flight transient connection failure (#6123) by @Sevenannn in #6123
Update Flight DoPut implementation to send single final PutResult (#6124) by @sgrebnov in #6124
feat: Add metrics for search results cache (#6129) by @peasee in #6129
update MCP crate (#6130) by @Jeadie in #6130
feat: Add search cache status header, respect cache control (#6131) by @peasee in #6131
fix: Allow specifying individual caching blocks (#6133) by @peasee in #6133
Update openapi.json (#6132) by @app/github-actions in #6132
Add CSV support to Glue data connector (#6138) by @kczimm in #6138
Update Spice Cloud Platform management UX (#6140) by @sgrebnov in #6140
Add TPCH bench for Glue catalog (#6055) by @kczimm in #6055
Enforce max_tokens_per_request limit in OpenAI embedding logic (#6144) by @sgrebnov in #6144
Enable Spice Cloud Control Plane connect (management) for FinanceBench (#6147) by @sgrebnov in #6147
Add integration test for Spice Cloud Platform management (#6150) by @sgrebnov in #6150
fix: Invalidate search cache on refresh (#6137) by @peasee in #6137
fix: Prevent registering cron schedule with change stream accelerations (#6152) by @peasee in #6152
test: Add an append cron integration test (#6151) by @peasee in #6151
fix: Cache search results with no-cache directive (#6155) by @peasee in #6155
fix: Glue catalog dispatch runner type (#6157) by @peasee in #6157
Fix: Glue S3 location for directories and Iceberg credentials (#6174) by @kczimm in #6174
Support multiple columns in FTS (#6156) by @Jeadie in #6156
fix: Add --cache-control flag for search CLI (#6158) by @peasee in #6158
Add Glue data connector tpch bench test for parquet and csv (#6170) by @kczimm in #6170
fix: Apply results cache deprecation correctly (#6177) by @peasee in #6177
Fix regression in Parquet pushdown (#6178) by @phillipleblanc in #6178
Fix CUDA build (use candle-core 0.8.4 and cudarc v0.12) (#6181) by @sgrebnov in #6181
return empty stream if no external_links present (#6192) by @kczimm in #6192
Use arrow pretty print util instead of init dataframe / logical plan in display_records (#6191) by @Sevenannn in #6191
task: Enable additional TPCDS test scenarios in dispatcher (#6160) by @peasee in #6160
chore: Update dependencies (#6196) by @peasee in #6196
Fix FlightSQL GetDbSchemas and GetTables schemas to fully match the protocol (#6197) by @sgrebnov in #6197
Use spice-rs in test operator and retry on connection reset error (#6136) by @Sevenannn in #6136
Fix load status metric description (#6219) by @phillipleblanc in #6219
Run extended tests on PRs against release branch, update glue_iceberg_integration_test_catalog test (#6204) by @Sevenannn in #6204
query schema for is_nullable (#6229) by @kczimm in #6229
fix: use the query error message when queries fail (#6228) by @kczimm in #6228
fix glue iceberg catalog integration test (#6249) by @Sevenannn in #6249
cache table providers in glue catalog (#6252) by @kczimm in #6252
fix: databricks sql_warehouse schema contains duplicate fields (#6255) by @phillipleblanc in #6255

Full Changelog: v1.3.2...v1.4.0

Spice v1.3.2 (June 2, 2025)

June 2, 2025 · 2 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.3.2! ❄️

Spice v1.3.2 is a patch release with fixes to the DuckDB data accelerator and Snowflake data connector.

Changes:

DuckDB Data Accelerator: Supports ORDER BY rand() for randomized result ordering and ORDER BY NULL for SQL compatibility.
Snowflake Data Connector: Adds TIMESTAMP_NTZ(0) type for timestamps with seconds precision.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No new cookbook recipes.

The Spice Cookbook now includes 67 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.3.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.3.2 image:

docker pull spiceai/spiceai:1.3.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

No major dependency changes.

Changelog

Handle Snowflake Timestamp NTZ with seconds precision (#6084) by @kczimm in #6084
Fix DuckDB acceleration ORDER BY rand() and ORDER BY NULL (#6071) by @phillipleblanc in #6071

Full Changelog: https://github.com/spiceai/spiceai/compare/v1.3.1...v1.3.2

Spice v1.3.1 (May 26, 2025)

May 26, 2025 · 3 min read

Luke Kim

Founder and CEO of Spice AI

Announcing the release of Spice v1.3.1! 🛡️

Spice v1.3.1 includes improvements to Databricks SQL Warehouse support and parameterized query handling, along with several bugfixes.

What's New in v1.3.1

Databricks SQL Warehouse Added support for the STRUCT type, enabled join pushdown for queries within the same SQL Warehouse and added projection to logical plans to force federation with correct SQL dialect.
SQL Improvements: Fixed an issue where ILike was incorrectly optimized to string equality in DataFusion/Arrow and aliased the random() function to rand() for better compatibility.
Parameterized Queries: Fixed parameter schema ordering for queries with more than 10 parameters and resolved placeholder inference issues in CASE expressions.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No new cookbook recipes.

The Spice Cookbook now includes 67 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.3.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.3.1 image:

docker pull spiceai/spiceai:1.3.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

No major dependency changes.

Changelog

Bump Helm chart to 1.3.0 by @phillipleblanc in #5925
Fix Databricks SQL Warehouse benchmark test by @phillipleblanc, @lukekim, @kczimm, [@Spice Benchmark Snapshot Update Bot](https://github.com/Spice Benchmark Snapshot Update Bot) in #5924
Add support for STRUCT type in Databricks SQL Warehouse by @kczimm, @lukekim in #5936
Add projection to logical plan to force federation and correct dialect by @kczimm, @lukekim in #5946
Allow join push down for same SQL Warehouse by @kczimm, @lukekim in #5947
Avoid mistaken ILike to string equality optimization (DataFusion / Arrow) by @sgrebnov, @lukekim in #5939
Make spill_to_disk_and_rehydration test more robust by @sgrebnov, @lukekim in #5929
Alias the random() function to rand() by @phillipleblanc, @lukekim in #5967
Fix parameter schema ordering with > 10 parameters for parameterized queries by @phillipleblanc, @lukekim in #5962
Rev version to v1.3.1 by @lukekim in #5975
Fix placeholder inference in CASE expressions by @phillipleblanc, @lukekim in #5968

Full Changelog: github.com/spiceai/spiceai/compare/v1.3.0...v1.3.1

What's New in v1.7.1​

Bug Fixes & Improvements​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.7.0​

DataFusion v49 Highlights​

Spice Runtime Highlights​

Bug Fixes​

Contributors​

New Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

What's New in v1.6.1​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.6.0​

DataFusion v48 Highlights​

Runtime Highlights​

Contributors​

New Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

What's New in v1.5.2​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

What's New in v1.5.1​

Contributors​

New Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

What's New in v1.5.0​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

What's New in v1.4.0​

DataFusion v47 Highlights​

Arrow v55 Highlights​

Runtime Highlights​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

What's New in v1.7.1

Bug Fixes & Improvements

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.7.0

DataFusion v49 Highlights

Spice Runtime Highlights

Bug Fixes

Contributors

New Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog

What's New in v1.6.1

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.6.0

DataFusion v48 Highlights

Runtime Highlights

Contributors

New Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog

What's New in v1.5.2

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog

What's New in v1.5.1

Contributors

New Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog

What's New in v1.5.0

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog

What's New in v1.4.0

DataFusion v47 Highlights

Arrow v55 Highlights

Runtime Highlights

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies