chat
Start an interactive or one-shot chat with a model registered in the Spice runtime.
Requirements
- Spice runtime must be running
- At least one model defined in
spicepod.yaml
and the model is ready
Usage
Interative Chat: Invoke the command without arguments to open a REPL
spice chat [flags]
One-shot Chat: Pass a single message as the argument to send a one-shot chat request and print the response
spice chat [flags] [<message>]
Flags
--cloud
Send requests to a Spice Cloud instance instead of the local instance. Default:false
.--http-endpoint <string>
Runtime HTTP endpoint. Default:http://localhost:8090
.--model <string>
Target model for the chat request. When omitted, the CLI uses the single ready model or prompts for a choice if several models are ready.--temperature <float32>
Model temperature used for chat request. Default:1
.--user-agent <string>
CustomUser-Agent
header sent with every request.
Examples
When exactly one model is ready, spice chat
opens a REPL that uses that model automatically:
> spice chat
Using model: openai
chat> hello
Hello! How can I assist you today?
Time: 0.57s (first token 0.53s). Tokens: 18. Prompt: 8. Completion: 10 (325.04/s).
When multiple models are ready, the command prompts for a selection before starting the REPL:
> spice chat
Use the arrow keys to navigate: ↓ ↑ → ←
? Select model:
▸ openai
llama
Using model: openai
chat> hello
Hello! How can I assist you today?
Time: 0.55s (first token 0.43s). Tokens: 18. Prompt: 8. Completion: 10 (80.09/s).
Passing --model
skips the prompt and directs the request to the specified model. The flag works both in REPL mode and in one‑shot mode:
# REPL
spice chat --model openai
chat> hello
Hello! How can I assist you today?
Time: 0.61s (first token 0.58s). Tokens: 18. Prompt: 8. Completion: 10 (285.90/s).
Single prompt:
# One‑shot
spice chat --model openai "hello"
Hello! How can I assist you today?
Time: 1.10s (first token 0.80s). Tokens: 18. Prompt: 8. Completion: 10 (33.74/s).