Skip to main content

One post tagged with "rrf"

Reciprocal Rank Fusion related topics and usage

View All Tags

Spice v1.7.1 (Sep 29, 2025)

ยท 6 min read
Kevin Zimmerman
Principal Software Engineer at Spice AI

Announcing the release of Spice v1.7.1! ๐Ÿ”

Spice v1.7.1 is a patch release focused on search improvements, bug fixes, and performance enhancements. This release introduces the Reciprocal Rank Fusion (RRF) user-defined table function (UDTF) for hybrid search, improves vector and text search reliability, and resolves several issues across the runtime, connectors, and query engine.

What's New in v1.7.1โ€‹

Reciprocal Rank Fusion (RRF) UDTF: Spice now supports Reciprocal Rank Fusion (RRF) as a user-defined table function, enabling advanced hybrid search scenarios that combine results from multiple search methods (e.g., vector and text search) for improved relevance ranking.

Features:

  • Multi-search fusion: Combine results from vector_search, text_search, and other search UDTFs in a single query.
  • Advanced tuning: Per-query ranking weights, recency boosting, and configurable decay functions.
  • Performance: Optional user-specified join key for optimal performance.
  • Automatic joining: Falls back to on-the-fly JOIN key computation when no explicit key is provided.

Example usage:

SELECT id, title, content, fused_score
FROM rrf(
vector_search(documents, 'machine learning algorithms', rank_weight => 1.5),
text_search(documents, 'neural networks deep learning', rank_weight => 1.2),
join_key => 'id', -- optional join key for optimal performance
k => 60.0 -- optional smoothing factor
)
WHERE fused_score > 0.01
ORDER BY fused_score DESC;

Learn more in the RRF documentation.

Acceleration Refresh Metrics: Spice now exposes additional Prometheus metrics that provide detailed observability into dataset acceleration refreshes. These metrics help monitor data freshness and ingestion lag for accelerated datasets with a time column.

Reported metrics:

Metric NameDescription
dataset_acceleration_max_timestamp_before_refresh_msMaximum value of the dataset's time column before refresh (milliseconds).
dataset_acceleration_max_timestamp_after_refresh_msMaximum value of the dataset's time column after refresh (milliseconds).
dataset_acceleration_refresh_lag_msDifference between max timestamp after and before refresh (milliseconds).
dataset_acceleration_ingestion_lag_msLag between current wall-clock time and max timestamp after refresh (milliseconds).

These metrics are emitted during each acceleration refresh and can be scraped by Prometheus for monitoring and alerting. For more details, see the Observability documentation.

Bug Fixes & Improvementsโ€‹

This release resolves several issues and improves reliability across search, connectors, and query planning:

  • Full-Text Search (FTS): Ensure FTS metadata columns can be used in projection, fix JOIN-level filters not having columns in schema, and adds support for persistent file-based FTS indexes. Default limit of 1000 results if no limit specified.
  • Vector Search: Default limit of 1000 results if no limit specified, and fix removing embedding column.
  • Databricks SQL Warehouse: Improved error handling and support for async queries.
  • Other: Fixes for Anthropic model regex validation, tweaked AI-model health checks, and improved error messages.

Contributorsโ€‹

Breaking Changesโ€‹

No breaking changes.

Cookbook Updatesโ€‹

  • Added Hybrid-Search using RRF - Combine results from multiple search methods (vector and text search) using Reciprocal Rank Fusion for improved relevance ranking.

The Spice Cookbook includes 78 recipes to help you get started with Spice quickly and easily.

Upgradingโ€‹

To upgrade to v1.7.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.7.1 image:

docker pull spiceai/spiceai:1.7.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

๐ŸŽ‰ Spice is now available in the AWS Marketplace!

What's Changedโ€‹

Changelogโ€‹

  • ensure FTS metadata columns can be used in projection (#7282) by @Jeadie in #7282
  • Fix JOIN level filters not having columns in schema (#7287) by @Jeadie in #7287
  • Use file-based fts index (#7024) by @Jeadie in #7024
  • Remove 'PostApplyCandidateGeneration' (#7288) by @Jeadie in #7288
  • RRF: Rank and recency boosting (#7294) by @mach-kernel in #7294
  • RRF: Preserve base ranking when results differ -> FULL OUTER JOIN does not produce time column (#7300) by @mach-kernel in #7300
  • fix removing embedding column (#7302) by @Jeadie in #7302
  • RRF: Fix decay for disjoint result sets (#7305) by @mach-kernel in #7305
  • RRF: Project top scores, do not yield duplicate results (#7306) by @mach-kernel in #7306
  • RRF: Case sensitive column/ident handling (#7309) by @mach-kernel in #7309
  • For vector_search, use a default limit of 1000 if no limit specified (#7311) by @lukekim in #7311
  • Fix Anthropic model regex and add validation tests (#7319) by @ewgenius in #7319
  • Enhancement: Implement before/after/lag metrics for acceleration refresh (#7310) by @krinart in #7310
  • Refactor chat model health check to lower tokens usage for reasoning models (#7317) by @ewgenius in #7317
  • Enable chunking in SearchIndex (#7143) by @Jeadie in #7143
  • Use logical plan in SearchQueryProvider. (#7314) by @Jeadie in #7314
  • FTS max search results 100 -> 1000 (#7331) by @Jeadie in #7331
  • Improve Databricks SQL Warehouse Error Handling (#7332) by @sgrebnov in #7332
  • use spicepod embedding model name for 'model_name' (#7333) by @Jeadie in #7333
  • Handle async queries for Databricks SQL Warehouse API (#7335) by @phillipleblanc in #7335
  • RRF: Fix ident resolution for struct fields, autohashed join key for varying types (#7339) by @mach-kernel in #7339