Large Language Models
Spice provides a high-performance, OpenAI API-compatible AI Gateway optimized for managing and scaling large language models (LLMs). It offers tools for Enterprise Retrieval-Augmented Generation (RAG), such as SQL query across federated datasets and an advanced search feature (see Search).
Spice supports full OpenTelemetry observability, helping with detailed tracking of model tool use, recursion, data flows and requests for full transparency and easier debugging.
Quickstart
Add a language model to your spicepod.yaml to start using AI capabilities:
models:
- from: openai:gpt-4o-mini
name: my_model
params:
openai_api_key: ${ env:OPENAI_API_KEY }
tools: auto # Gives the model access to datasets for data-grounded responses
Start the runtime and use the chat REPL:
spice run
# In another terminal:
spice chat
chat> What tables are available?
Or call the OpenAI-compatible HTTP API directly:
curl -X POST http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my_model",
"messages": [{"role": "user", "content": "What tables are available?"}]
}'
Configuring Language Models
Spice supports a variety of LLMs (see Model Providers).
Core Features
- SQL Integration: Invoke LLMs directly within SQL queries using the
ai()function for text generation tasks. See SQL Reference: ai function. - Custom Tools: Provide models with tools to interact with the Spice runtime. See Tools.
- System Prompts: Customize system prompts and override defaults for
v1/chat/completion. See Parameter Overrides. - Memory: Provide LLMs with memory persistence tools to store and retrieve information across conversations. See Memory.
- Vector Search: Perform advanced vector-based searches using embeddings. See Vector Search.
- Evals: Evaluate, track, compare, and improve language model performance for specific tasks. See Evals.
- Local Models: Load and serve models locally from various sources, including local filesystems and Hugging Face. See Local Models.
For API usage, refer to the API Documentation.
📄️ Tools
Learn how LLMs interact with the Spice runtime.
📄️ MCP
Learn how to use the Model Context Protocol (MCP) with Spice.
📄️ Memory
Learn how to provide LLMs with memory
📄️ Evals
Learn how Spice evaluates, tracks, compares, and improves language model performance for specific tasks
📄️ Parameter Overrides
Learn how to override default LLM hyperparameters in Spice.
📄️ Local Models
Learn how to load and serve large learning models.
📄️ Parameterized Prompts
Learn how to update system prompts for each request with Jinja-styled templating.
