Skip to main content

chat

Start an interactive or one-shot chat with a model registered in the Spice runtime.

Requirements

  • Spice runtime must be running
  • At least one model defined in spicepod.yaml and the model is ready

Usage

Interative Chat: Invoke the command without arguments to open a REPL

spice chat [flags]

One-shot Chat: Pass a single message as the argument to send a one-shot chat request and print the response

spice chat [flags] [<message>]

Flags

  • --cloud Send requests to a Spice Cloud instance instead of the local instance. Default: false.
  • --http-endpoint <string> Runtime HTTP endpoint. Default: http://localhost:8090.
  • --model <string> Target model for the chat request. When omitted, the CLI uses the single ready model or prompts for a choice if several models are ready.
  • --temperature <float32> Model temperature used for chat request. Default: 1.
  • --user-agent <string> Custom User-Agent header sent with every request.
  • --responses Direct all chats to the /v1/responses endpoint, which exposes configured models that support OpenAI's Responses API and enables access to OpenAI-hosted tools. To learn more about Spice's support for OpenAI's Responses API, view the OpenAI model provider documentation or the Azure OpenAI model provider documentation.

Examples

When exactly one model is ready, spice chat opens a REPL that uses that model automatically:

> spice chat
Using model: openai
chat> hello
Hello! How can I assist you today?

Time: 0.57s (first token 0.53s). Tokens: 18. Prompt: 8. Completion: 10 (325.04/s).

When multiple models are ready, the command prompts for a selection before starting the REPL:

> spice chat
Use the arrow keys to navigate: ↓ ↑ → ←
? Select model:
▸ openai
llama
Using model: openai
chat> hello
Hello! How can I assist you today?

Time: 0.55s (first token 0.43s). Tokens: 18. Prompt: 8. Completion: 10 (80.09/s).

Passing --model skips the prompt and directs the request to the specified model. The flag works both in REPL mode and in one‑shot mode:

# REPL
spice chat --model openai
chat> hello
Hello! How can I assist you today?

Time: 0.61s (first token 0.58s). Tokens: 18. Prompt: 8. Completion: 10 (285.90/s).

Single prompt:

# One‑shot
spice chat --model openai "hello"
Hello! How can I assist you today?

Time: 1.10s (first token 0.80s). Tokens: 18. Prompt: 8. Completion: 10 (33.74/s).