One post tagged with "rrf"

Spice v1.7.1 (Sep 29, 2025)

September 30, 2025 · 6 min read

Principal Software Engineer at Spice AI

Announcing the release of Spice v1.7.1! 🔍

Spice v1.7.1 is a patch release focused on search improvements, bug fixes, and performance enhancements. This release introduces the Reciprocal Rank Fusion (RRF) user-defined table function (UDTF) for hybrid search, improves vector and text search reliability, and resolves several issues across the runtime, connectors, and query engine.

What's New in v1.7.1

Reciprocal Rank Fusion (RRF) UDTF: Spice now supports Reciprocal Rank Fusion (RRF) as a user-defined table function, enabling advanced hybrid search scenarios that combine results from multiple search methods (e.g., vector and text search) for improved relevance ranking.

Features:

Multi-search fusion: Combine results from vector_search, text_search, and other search UDTFs in a single query.
Advanced tuning: Per-query ranking weights, recency boosting, and configurable decay functions.
Performance: Optional user-specified join key for optimal performance.
Automatic joining: Falls back to on-the-fly JOIN key computation when no explicit key is provided.

Example usage:

SELECT id, title, content, fused_score
FROM rrf(
  vector_search(documents, 'machine learning algorithms', rank_weight => 1.5),
  text_search(documents, 'neural networks deep learning', rank_weight => 1.2),
  join_key => 'id',    -- optional join key for optimal performance
  k => 60.0            -- optional smoothing factor
)
WHERE fused_score > 0.01
ORDER BY fused_score DESC;

Learn more in the RRF documentation.

Acceleration Refresh Metrics: Spice now exposes additional Prometheus metrics that provide detailed observability into dataset acceleration refreshes. These metrics help monitor data freshness and ingestion lag for accelerated datasets with a time column.

Reported metrics:

Metric Name	Description
`dataset_acceleration_max_timestamp_before_refresh_ms`	Maximum value of the dataset's time column before refresh (milliseconds).
`dataset_acceleration_max_timestamp_after_refresh_ms`	Maximum value of the dataset's time column after refresh (milliseconds).
`dataset_acceleration_refresh_lag_ms`	Difference between max timestamp after and before refresh (milliseconds).
`dataset_acceleration_ingestion_lag_ms`	Lag between current wall-clock time and max timestamp after refresh (milliseconds).

These metrics are emitted during each acceleration refresh and can be scraped by Prometheus for monitoring and alerting. For more details, see the Observability documentation.

Bug Fixes & Improvements

This release resolves several issues and improves reliability across search, connectors, and query planning:

Full-Text Search (FTS): Ensure FTS metadata columns can be used in projection, fix JOIN-level filters not having columns in schema, and adds support for persistent file-based FTS indexes. Default limit of 1000 results if no limit specified.
Vector Search: Default limit of 1000 results if no limit specified, and fix removing embedding column.
Databricks SQL Warehouse: Improved error handling and support for async queries.
Other: Fixes for Anthropic model regex validation, tweaked AI-model health checks, and improved error messages.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

Added Hybrid-Search using RRF - Combine results from multiple search methods (vector and text search) using Reciprocal Rank Fusion for improved relevance ranking.

The Spice Cookbook includes 78 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.7.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.7.1 image:

docker pull spiceai/spiceai:1.7.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Changelog

ensure FTS metadata columns can be used in projection (#7282) by @Jeadie in #7282
Fix JOIN level filters not having columns in schema (#7287) by @Jeadie in #7287
Use file-based fts index (#7024) by @Jeadie in #7024
Remove 'PostApplyCandidateGeneration' (#7288) by @Jeadie in #7288
RRF: Rank and recency boosting (#7294) by @mach-kernel in #7294
RRF: Preserve base ranking when results differ -> FULL OUTER JOIN does not produce time column (#7300) by @mach-kernel in #7300
fix removing embedding column (#7302) by @Jeadie in #7302
RRF: Fix decay for disjoint result sets (#7305) by @mach-kernel in #7305
RRF: Project top scores, do not yield duplicate results (#7306) by @mach-kernel in #7306
RRF: Case sensitive column/ident handling (#7309) by @mach-kernel in #7309
For vector_search, use a default limit of 1000 if no limit specified (#7311) by @lukekim in #7311
Fix Anthropic model regex and add validation tests (#7319) by @ewgenius in #7319
Enhancement: Implement before/after/lag metrics for acceleration refresh (#7310) by @krinart in #7310
Refactor chat model health check to lower tokens usage for reasoning models (#7317) by @ewgenius in #7317
Enable chunking in SearchIndex (#7143) by @Jeadie in #7143
Use logical plan in SearchQueryProvider. (#7314) by @Jeadie in #7314
FTS max search results 100 -> 1000 (#7331) by @Jeadie in #7331
Improve Databricks SQL Warehouse Error Handling (#7332) by @sgrebnov in #7332
use spicepod embedding model name for 'model_name' (#7333) by @Jeadie in #7333
Handle async queries for Databricks SQL Warehouse API (#7335) by @phillipleblanc in #7335
RRF: Fix ident resolution for struct fields, autohashed join key for varying types (#7339) by @mach-kernel in #7339

What's New in v1.7.1​

Bug Fixes & Improvements​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​