Embedding Datasets
Learn how to define and augment datasets with embedding columns for advanced search capabilities.
Overview​
Spice provides three distinct methods for handling embedding columns in datasets:
- Just-in-Time (JIT) Embeddings: Dynamically computes embeddings, on-demand, during query execution, without precomputing data.
- Accelerated Embeddings: Precomputes embeddings by transforming and augmenting the source dataset for faster query and search performance.
- Passthrough Embeddings: Utilizes pre-existing embeddings directly from the underlying source datasets, bypassing any additional computation.
Configuring Embedding Models​
Before configuring dataset embeddings define the embedding models in the spicepod.yaml
, for example:
embeddings:
- name: local_embedding_model
from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2
- from: openai
name: remote_service
params:
openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY }
See Embedding components for more information on embedding models.
Vector Searches​
Spice supports complex searches by utilizing embeddings. Both local and remote embedding models can be used for vector searches.
To run a vector search, embeddings must be defined for the relevant columns in your dataset. Once configured, similarity searches can be performed using the defined embeddings.
For detailed instructions and examples on running vector searches, refer to the Vector-Based Search documentation.