Search in SQL
This section documents search capabilities in Spice SQL, including vector search, full-text search, and lexical filtering methods. These features help retrieve relevant data using semantic similarity, keyword matching, and pattern-based filtering.
Table of Contents​
- Table of Contents
- Vector Search (
vector_search) - Full-Text Search (
text_search) - Reciprocal Rank Fusion (
rrf) - Reranking (
rerank) - Lexical Search: LIKE, =, and Regex
Vector Search (vector_search)​
Vector search retrieves records by semantic similarity using embeddings. It is ideal for finding related content even when exact keywords differ.
Usage​
SELECT id, score
FROM vector_search(table, 'search query')
ORDER BY score DESC
LIMIT 5;
table: Dataset name (required)query: Search text, or an array of strings for multi-query search (required)column: Column name (optional if only one embedding column; required when the table has multiple embedded columns)limit: Maximum results (optional). When omitted, the engine-defined maximum is used.include_score: Include relevance scores (optional, defaultTRUE)distance_metric: Similarity metric used to rank candidate vectors (optional, named argument). Supported values:'cosine'(default) and'l2'(negated Euclidean distance).'dot'is parsed but not yet wired through the scan path.rank_weight: Per-query ranking weight (optional, named argument). Only meaningful whenvector_searchis passed as a subquery torrf.
Filter Pushdown​
WHERE predicates on base table columns (e.g., created_at, product_category) are pushed down as pre-filters — they are applied before the similarity ranking, so only matching rows are scored and returned. This means results reflect the top-K within the filtered set, not the top-K of the entire table filtered afterward.
Predicates on computed columns like score are applied as post-filters after ranking.
Example​
-- Filters on created_at are pushed down before ranking
SELECT review_id, rating, customer_id, body, score
FROM vector_search(reviews, 'issues with same day shipping', 1500)
WHERE created_at >= to_unixtime(now() - INTERVAL '7 days')
ORDER BY score DESC
LIMIT 2;
To override the similarity metric, pass distance_metric as a named argument:
SELECT id, body, score
FROM vector_search(reviews, 'issues with shipping', distance_metric => 'l2')
ORDER BY score DESC
LIMIT 10;
See Vector-Based Search for configuration and advanced usage.
Multi-Query (Late-Interaction) Form​
When the target column is a multi-vector column, vector_search also accepts an array of query strings. Each query is embedded independently and the per-row score is Σ_q max_e cos(q, e) — ColBERT-style late interaction. Passing an array to a scalar or chunked column returns an error. At most 32 query strings are accepted per call.
SELECT product_id, name, score
FROM vector_search(products, ['hiking', 'waterproof', 'lightweight'], tags)
ORDER BY score DESC
LIMIT 10;
Full-Text Search (text_search)​
Full-text search uses BM25 scoring to retrieve records matching keywords in indexed columns.
Usage​
SELECT id, score
FROM text_search(table, 'search terms', col)
ORDER BY score DESC
LIMIT 5;
table: Dataset name (required)query: Keyword or phrase (required)column: Column to search (optional if the table has a single full-text index; required when multiple columns are indexed)limit: Maximum results (optional). Defaults to 1000, which is the maximum supported.include_score: Include relevance scores (optional, defaultTRUE)rank_weight: Per-query ranking weight (optional, named argument). Only meaningful whentext_searchis passed as a subquery torrf.
By default, text_search retrieves up to 1000 results. To request fewer, specify a smaller limit.
Example​
SELECT id, title, score
FROM text_search(doc.pulls, 'search keywords', body)
ORDER BY score DESC
LIMIT 5;
See Full-Text Search for configuration and details.
Reciprocal Rank Fusion (rrf)​
Reciprocal Rank Fusion (RRF) combines results from multiple search queries to improve relevance by merging rankings from different search methods. Advanced features include per-query ranking weights, recency boosting, and flexible decay functions.
Usage​
rrf is variadic and takes two or more search UDTF calls as arguments. Named parameters provide advanced control over ranking, recency, and fusion behavior.
The rrf function automatically adds a fused_score column to the result set, which contains the combined relevance score from all input search queries. Results are sorted by fused_score DESC by default when no explicit ORDER BY clause is specified.
SELECT id, content, fused_score
FROM rrf(
vector_search(table, 'search query', rank_weight => 20),
text_search(table, 'search terms', column),
join_key => 'id', -- explicit join key for performance
k => 60.0 -- smoothing parameter
)
ORDER BY fused_score DESC
LIMIT 10;
Arguments:
Note that rank_weight is specified as the last argument to either a text_search or vector_search UDTF call (as shown above). All other arguments can be specified in any order after the search calls (within an rrf invocation).
| Parameter | Type | Required | Description |
|---|---|---|---|
query_1 | Search UDTF call | Yes | First search query (e.g., vector_search, text_search) |
query_2 | Search UDTF call | Yes | Second search query. rrf requires at least two subqueries. |
... | Search UDTF call | No | Additional search queries (variadic) |
join_key | String | No | Column name to use for joining subquery results. If omitted, the primary key is inferred from the underlying tables; otherwise rows are auto-hashed. |
k | Float | No | Smoothing parameter for RRF scoring (default: 60.0) |
limit | Integer | No | Upper bound on the fused result set. Also propagated as a default limit to any nested search subquery that does not specify its own. |
time_column | String | No | Column name containing timestamps for recency boosting |
recency_decay | String | No | Decay function: 'linear' or 'exponential' (default: 'exponential') |
decay_constant | Float | No | Decay rate for exponential decay (default: 0.01) |
decay_scale_secs | Float | No | Time scale in seconds for decay (default: 86400) |
decay_window_secs | Float | No | Window size for linear decay in seconds (default: 86400) |
rank_weight | Float | No | Per-query ranking weight (specified within the individual search subquery call) |
Filter Pushdown​
WHERE predicates on base table columns (e.g., review_date, product_category) are pushed down into each nested search subquery as pre-filters — they are applied before ranking and fusion, so each subquery only considers matching rows. This means the fused results reflect the top-K within the filtered set, not a post-filtered slice of unfiltered rankings.
Predicates on computed columns like fused_score are applied as post-filters after fusion.
-- review_date and product_category are pushed into each vector_search before ranking
SELECT review_id, review_headline
FROM rrf(
vector_search(amazon_reviews, 'cannot exit the app', rank_weight => 20),
vector_search(amazon_reviews, 'app not working', rank_weight => 10),
join_key => 'review_id',
k => 60.0
)
WHERE review_date > '2015-06-15' AND product_category = 'Mobile_Apps'
LIMIT 10;
Examples​
Basic Hybrid Search:
-- Combine vector and text search for enhanced relevance
SELECT id, title, content, fused_score
FROM rrf(
vector_search(documents, 'machine learning algorithms'),
text_search(documents, 'neural networks deep learning', content),
join_key => 'id' -- explicit join key for performance
)
WHERE fused_score > 0.01
ORDER BY fused_score DESC
LIMIT 5;
Weighted Ranking:
-- Boost semantic search over exact text matching
SELECT fused_score, title, content
FROM rrf(
text_search(posts, 'artificial intelligence', rank_weight => 50.0),
vector_search(posts, 'AI machine learning', rank_weight => 200.0)
)
ORDER BY fused_score DESC
LIMIT 10;
Recency-Boosted Search:
-- Exponential decay favoring recent content
SELECT fused_score, title, created_at
FROM rrf(
text_search(news, 'breaking news'),
vector_search(news, 'latest updates'),
time_column => 'created_at',
recency_decay => 'exponential',
decay_constant => 0.05,
decay_scale_secs => 3600 -- 1 hour scale
)
ORDER BY fused_score DESC
LIMIT 10;
Linear Decay:
-- Linear decay over 24 hours
SELECT fused_score, content
FROM rrf(
text_search(posts, 'trending'),
vector_search(posts, 'viral popular'),
time_column => 'created_at',
recency_decay => 'linear',
decay_window_secs => 86400
)
ORDER BY fused_score DESC;
How RRF works:
- Each input query is ranked independently by score
- Rankings are combined using the formula:
RRF Score = Σ(rank_weight / (k + rank)) - Documents appearing in multiple result sets receive higher scores
- The
kparameter controls ranking sensitivity (lower = more sensitive to rank position)
Advanced query tuning:
- Rank weighting: Individual queries can be weighted using
rank_weightparameter - Recency boosting: When
time_columnis specified, scores are multiplied by a decay factor- Exponential decay:
e^(-decay_constant * age_in_units)where age is indecay_scale_secs - Linear decay:
max(0, 1 - (age_in_units / decay_window_secs))
- Exponential decay:
- Auto-join: When no
join_keyis specified,rrfinfers the primary key from the underlying tables; if none is available, rows are joined by an auto-generated row identifier
Reranking (rerank)​
Reranking reorders candidate results using a dedicated reranker model or an LLM-as-reranker for improved relevance. The input can be any search UDTF (vector_search, text_search, rrf) or a plain table.
Usage​
SELECT *
FROM rerank(
<input>,
document => 'column_name',
model => 'reranker_name',
limit => 10
)
Arguments:
| Parameter | Type | Required | Description |
|---|---|---|---|
input | Table or UDTF | Yes | Input rows to rerank. Can be a search UDTF call (vector_search, text_search, rrf) or a table name. |
model | String | Yes | Name of a registered reranker or chat model. |
document | String | Yes | Column containing the text to send to the reranker for scoring. |
query | String | No | Query string for relevance scoring. Auto-extracted from nested search UDTFs when omitted; required for bare-table inputs. |
limit | Integer | No | Maximum number of results to return. |
strategy | String | No | LLM reranking strategy: 'listwise' (default) or 'pointwise'. Only applies when the model resolves to a chat model. |
prompt_template | String | No | Custom prompt template for LLM-as-reranker. Use {query} and {document} placeholders. Only applies when the model resolves to a chat model. |
Query Auto-Propagation​
When the input is a search UDTF (vector_search, text_search, or rrf wrapping search UDTFs), the query string is automatically extracted from the nested call. Single-string, make_array(...), and ARRAY[...] query forms are all supported. For multi-query inputs, the first query string is used.
For bare-table inputs, query must be provided explicitly.
Examples​
Rerank hybrid search results:
SELECT * FROM rerank(
rrf(
vector_search(docs, 'delta lake time travel', limit => 50),
text_search(docs, 'delta lake time travel', limit => 50)
),
document => 'content',
model => 'cohere_rr',
limit => 10
);
Rerank a plain table with an explicit query:
SELECT * FROM rerank(
tickets,
query => 'auth failures',
document => 'body',
model => 'voyage_rr',
limit => 5
);
LLM-as-reranker with custom prompt:
SELECT * FROM rerank(
vector_search(kb, 'onboarding checklist', limit => 40),
document => 'content',
model => 'gpt_mini',
strategy => 'pointwise',
prompt_template => 'Rate 0-1: is this useful for a new hire?\nQuery: {query}\nDoc: {document}',
limit => 10
);
See Reranking for configuration, provider setup, and additional examples.
Lexical Search: LIKE, =, and Regex​
Spice SQL supports traditional filtering for exact and pattern-based matches:
LIKE (Pattern Matching)​
SELECT * FROM my_table WHERE column LIKE '%substring%';
%matches any sequence of characters._matches a single character.
= (Keyword/Exact Match)​
SELECT * FROM my_table WHERE column = 'exact value';
Returns rows where the column exactly matches the value.
Regex Filtering​
Spice SQL supports the PostgreSQL regex operators ~ (match), ~* (case-insensitive match), !~ (not match), and !~* (case-insensitive not match) — see Operators. Alternatively, use scalar functions such as regexp_like, regexp_match, and regexp_replace. For details and examples, see the Scalar Functions documentation.
Example​
SELECT * FROM my_table WHERE column ~ '^spice.*ai$';
-- Or, equivalently:
SELECT * FROM my_table WHERE regexp_like(column, '^spice.*ai$');
For more on hybrid and advanced search, see Search Functionality and Vector-Based Search
