Skip to main content
Version: Next

Large Language Models

Spice provides a high-performance, OpenAI API-compatible AI Gateway optimized for managing and scaling large language models (LLMs). It offers tools for Enterprise Retrieval-Augmented Generation (RAG), such as SQL query across federated datasets and an advanced search feature (see Search).

ai-gateway

Spice supports full OpenTelemetry observability, helping with detailed tracking of model tool use, recursion, data flows and requests for full transparency and easier debugging.

Quickstart

Add a language model to your spicepod.yaml to start using AI capabilities:

models:
- from: openai:gpt-4o-mini
name: my_model
params:
openai_api_key: ${ env:OPENAI_API_KEY }
tools: auto # Gives the model access to datasets for data-grounded responses

Start the runtime and use the chat REPL:

spice run
# In another terminal:
spice chat
chat> What tables are available?

Or call the OpenAI-compatible HTTP API directly:

curl -X POST http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my_model",
"messages": [{"role": "user", "content": "What tables are available?"}]
}'

Configuring Language Models

Spice supports a variety of LLMs (see Model Providers).

Core Features

  • SQL Integration: Invoke LLMs directly within SQL queries using the ai() function for text generation tasks. See SQL Reference: ai function.
  • Custom Tools: Provide models with tools to interact with the Spice runtime. See Tools.
  • System Prompts: Customize system prompts and override defaults for v1/chat/completion. See Parameter Overrides.
  • Memory: Provide LLMs with memory persistence tools to store and retrieve information across conversations. See Memory.
  • Vector Search: Perform advanced vector-based searches using embeddings. See Vector Search.
  • Evals: Evaluate, track, compare, and improve language model performance for specific tasks. See Evals.
  • Local Models: Load and serve models locally from various sources, including local filesystems and Hugging Face. See Local Models.

For API usage, refer to the API Documentation.