Skip to main content
Version: v1.10

Full-Text Search

Spice provides full text search functionality with BM25 scoring. Datasets can be augmented with a full-text search index that enables efficient search. Dataset columns are included in the full-text index based on the column configuration.

To enable full-text search, configure your dataset columns within your dataset definition as follows:

datasets:
- from: github:github.com/spiceai/docs/pulls
name: doc.pulls
params:
github_token: ${secrets:GITHUB_TOKEN}
acceleration:
enabled: true
columns:
- name: title
full_text_search:
enabled: true
row_id:
- id
- name: body
full_text_search:
enabled: true

In this example, full-text search indexing is enabled on both the title and body columns. The row_id specifies a unique identifier for referencing search results and retrieving additional data.

Searching with the HTTP API

After enabling indexing, you can perform searches using the HTTP API endpoint /v1/search. Results will be ranked based on the relevance to your keyword query across indexed columns (title and body in this example).

For details on using this endpoint, see the [API reference for /v1/search(../../api/HTTP/post-search).

Searching with SQL

Spice also provides full-text search through SQL using a user-defined table function (UDTF), text_search().

Example SQL Query

Here's how you can query using SQL:

SELECT id, title, score
FROM text_search(doc.pulls, 'search keywords', body)
ORDER BY score DESC
LIMIT 5;

This returns the top 5 results from the doc.pulls dataset that best match your search keywords within the body column.

Function Signature

The text_search() function has the following signature:

text_search(
table STRING, -- Dataset name (required)
query STRING, -- Keyword or phrase to search (required)
col STRING, -- Specific column to search (required if dataset has multiple indexed columns)
limit INTEGER, -- Maximum results returned (optional, defaults to 1000)
include_score BOOLEAN -- Include relevance scores in results (optional, defaults to TRUE)
)
RETURNS TABLE -- Original table columns plus an optional FLOAT column `score`

By default, text_search retrieves up to 1000 results. To adjust this, specify the limit parameter in the function call.

Use this function to integrate robust full-text search directly into your data workflows with minimal setup.