Workers Overview
Workers in the Spice runtime represent configurable units of compute that help coordinate and manage interactions between models and tools. Each worker is defined as a component in the spicepod.yaml
file, specifying its behavior and interaction logic.
Configuration​
Workers are configured in the workers
section of the spicepod.yaml
file. Each worker definition includes a name, description, and a list of models or tools it encapsulates.
Example spicepod.yaml
configuration:
workers:
- name: round-robin
description: |
Distributes requests between 'foo' and 'bar' models in a round-robin fashion.
models:
- from: foo
- from: bar
- name: fallback
description: |
Attempts 'bar' first, then 'foo', then 'baz' if previous models fail.
models:
- from: foo
order: 2
- from: bar
order: 1
- from: baz
order: 3
Use-Cases​
Workers currently help implement:
- Model fallback and error handling
- Load balancing across multiple models
Usage​
Workers can be invoked using the same API endpoints as individual models. For example, to call a worker named fallback
using the OpenAI-compatible HTTP API:
curl http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "fallback",
"messages": [{ "role": "user", "content": "Tell me a joke"}]
}'
Roadmap​
The vision for workers includes support for dynamic serverless compute, enabling execution of user-defined functions within the Spice runtime. This direction aims to help developers define custom logic and orchestration patterns directly in the worker configuration, supporting more advanced workflows and automation. Further details and implementation timelines will be provided in future updates. For ongoing progress, refer to the project repository and documentation.
Further Reading​
For a complete specification of worker configuration, routing rules, and available options, refer to the Spicepod Workers Reference.