Skip to main content

Full-Text Search

Spice provides full text search functionality with BM25 scoring. Datasets can be augmented with a full-text search index that enables efficient search. Dataset columns are included in the full-text index based on the column configuration. For example:

datasets:
- from: github:github.com/spiceai/docs/pulls
name: doc.pulls
params:
github_token: ${secrets:GITHUB_TOKEN}
acceleration:
enabled: true
columns:
- name: title
full_text_search:
enabled: true
row_id:
- id
- name: body
full_text_search:
enabled: true

Search results (from v1/search) will retrieve results based on the keyword similarity of fields title & body. For more details, see the API reference for /v1/search.

SQL UDTF​

The full text search index can also be used to perform search in SQL, via a user-defined table function (UDTF).

SELECT id, extra_column, score
FROM text_search(doc.pulls, 'search terms', body)
ORDER BY score desc
LIMIT 5

The function signature of text_search is

text_search(
table STRING, -- Dataset name (required)
query STRING, -- Search query expression (required)
col STRING, -- Column name to search in (required if multiple full text indices exist)
limit INTEGER, -- Maximum number of results to return (optional, default: all)
include_score BOOLEAN -- Whether to include relevance score (optional, default: TRUE)
)
RETURNS TABLE -- The original table and an additional FLOAT column `score` (if `include_score`).