Version: Next

Large Language Models

Spice provides a high-performance, OpenAI API-compatible AI Gateway optimized for managing and scaling large language models (LLMs). It offers tools for Enterprise Retrieval-Augmented Generation (RAG), such as SQL query across federated datasets and an advanced search feature (see Search).

Spice supports full OpenTelemetry observability, helping with detailed tracking of model tool use, recursion, data flows and requests for full transparency and easier debugging.

Quickstart

Add a language model to your spicepod.yaml to start using AI capabilities:

models:
  - from: openai:gpt-4o-mini
    name: my_model
    params:
      openai_api_key: ${ env:OPENAI_API_KEY }
      tools: auto  # Gives the model access to datasets for data-grounded responses

Start the runtime and use the chat REPL:

spice run
# In another terminal:
spice chat
chat> What tables are available?

Or call the OpenAI-compatible HTTP API directly:

curl -X POST http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my_model",
    "messages": [{"role": "user", "content": "What tables are available?"}]
  }'

Configuring Language Models

Spice supports a variety of LLMs (see Model Providers).

Core Features

SQL Integration: Invoke LLMs directly within SQL queries using the ai() function for text generation tasks. See SQL Reference: ai function.
Custom Tools: Provide models with tools to interact with the Spice runtime. See Tools.
System Prompts: Customize system prompts and override defaults for v1/chat/completion. See Parameter Overrides.
Memory: Provide LLMs with memory persistence tools to store and retrieve information across conversations. See Memory.
Vector Search: Perform advanced vector-based searches using embeddings. See Vector Search.
Local Models: Load and serve models locally from various sources, including local filesystems and Hugging Face. See Local Models.