Spice v1.7.1 (Sep 29, 2025)
Announcing the release of Spice v1.7.1! ๐
Spice v1.7.1 is a patch release focused on search improvements, bug fixes, and performance enhancements. This release introduces the Reciprocal Rank Fusion (RRF) user-defined table function (UDTF) for hybrid search, improves vector and text search reliability, and resolves several issues across the runtime, connectors, and query engine.
What's New in v1.7.1โ
Reciprocal Rank Fusion (RRF) UDTF: Spice now supports Reciprocal Rank Fusion (RRF) as a user-defined table function, enabling advanced hybrid search scenarios that combine results from multiple search methods (e.g., vector and text search) for improved relevance ranking.
Features:
- Multi-search fusion: Combine results from
vector_search
,text_search
, and other search UDTFs in a single query. - Advanced tuning: Per-query ranking weights, recency boosting, and configurable decay functions.
- Performance: Optional user-specified join key for optimal performance.
- Automatic joining: Falls back to on-the-fly JOIN key computation when no explicit key is provided.
Example usage:
SELECT id, title, content, fused_score
FROM rrf(
vector_search(documents, 'machine learning algorithms', rank_weight => 1.5),
text_search(documents, 'neural networks deep learning', rank_weight => 1.2),
join_key => 'id', -- optional join key for optimal performance
k => 60.0 -- optional smoothing factor
)
WHERE fused_score > 0.01
ORDER BY fused_score DESC;
Learn more in the RRF documentation.
Acceleration Refresh Metrics: Spice now exposes additional Prometheus metrics that provide detailed observability into dataset acceleration refreshes. These metrics help monitor data freshness and ingestion lag for accelerated datasets with a time column.
Reported metrics:
Metric Name | Description |
---|---|
dataset_acceleration_max_timestamp_before_refresh_ms | Maximum value of the dataset's time column before refresh (milliseconds). |
dataset_acceleration_max_timestamp_after_refresh_ms | Maximum value of the dataset's time column after refresh (milliseconds). |
dataset_acceleration_refresh_lag_ms | Difference between max timestamp after and before refresh (milliseconds). |
dataset_acceleration_ingestion_lag_ms | Lag between current wall-clock time and max timestamp after refresh (milliseconds). |
These metrics are emitted during each acceleration refresh and can be scraped by Prometheus for monitoring and alerting. For more details, see the Observability documentation.
Bug Fixes & Improvementsโ
This release resolves several issues and improves reliability across search, connectors, and query planning:
- Full-Text Search (FTS): Ensure FTS metadata columns can be used in projection, fix JOIN-level filters not having columns in schema, and adds support for persistent file-based FTS indexes. Default limit of 1000 results if no limit specified.
- Vector Search: Default limit of 1000 results if no limit specified, and fix removing embedding column.
- Databricks SQL Warehouse: Improved error handling and support for async queries.
- Other: Fixes for Anthropic model regex validation, tweaked AI-model health checks, and improved error messages.
Contributorsโ
Breaking Changesโ
No breaking changes.
Cookbook Updatesโ
- Added Hybrid-Search using RRF - Combine results from multiple search methods (vector and text search) using Reciprocal Rank Fusion for improved relevance ranking.
The Spice Cookbook includes 78 recipes to help you get started with Spice quickly and easily.
Upgradingโ
To upgrade to v1.7.1, use one of the following methods:
CLI:
spice upgrade
Homebrew:
brew upgrade spiceai/spiceai/spice
Docker:
Pull the spiceai/spiceai:1.7.1
image:
docker pull spiceai/spiceai:1.7.1
For available tags, see DockerHub.
Helm:
helm repo update
helm upgrade spiceai spiceai/spiceai
AWS Marketplace:
๐ Spice is now available in the AWS Marketplace!
What's Changedโ
Changelogโ
- ensure FTS metadata columns can be used in projection (#7282) by @Jeadie in #7282
- Fix JOIN level filters not having columns in schema (#7287) by @Jeadie in #7287
- Use file-based fts index (#7024) by @Jeadie in #7024
- Remove 'PostApplyCandidateGeneration' (#7288) by @Jeadie in #7288
- RRF: Rank and recency boosting (#7294) by @mach-kernel in #7294
- RRF: Preserve base ranking when results differ -> FULL OUTER JOIN does not produce time column (#7300) by @mach-kernel in #7300
- fix removing embedding column (#7302) by @Jeadie in #7302
- RRF: Fix decay for disjoint result sets (#7305) by @mach-kernel in #7305
- RRF: Project top scores, do not yield duplicate results (#7306) by @mach-kernel in #7306
- RRF: Case sensitive column/ident handling (#7309) by @mach-kernel in #7309
- For
vector_search
, use a default limit of 1000 if no limit specified (#7311) by @lukekim in #7311 - Fix Anthropic model regex and add validation tests (#7319) by @ewgenius in #7319
- Enhancement: Implement before/after/lag metrics for acceleration refresh (#7310) by @krinart in #7310
- Refactor chat model health check to lower tokens usage for reasoning models (#7317) by @ewgenius in #7317
- Enable chunking in
SearchIndex
(#7143) by @Jeadie in #7143 - Use logical plan in
SearchQueryProvider
. (#7314) by @Jeadie in #7314 - FTS max search results 100 -> 1000 (#7331) by @Jeadie in #7331
- Improve Databricks SQL Warehouse Error Handling (#7332) by @sgrebnov in #7332
- use spicepod embedding model name for 'model_name' (#7333) by @Jeadie in #7333
- Handle async queries for Databricks SQL Warehouse API (#7335) by @phillipleblanc in #7335
- RRF: Fix ident resolution for struct fields, autohashed join key for varying types (#7339) by @mach-kernel in #7339