Microservice Deployment (Single or Multiple Replicas)
The Spice Runtime operates as an independent microservice. Multiple replicas may be deployed behind a load balancer to achieve high availability and handle spikes in demand.
Benefits
- Loose coupling between the application and the Spice Runtime.
- Independent scaling and upgrades.
- Can serve multiple applications or services within an organization.
- Helps achieve high availability and redundancy.
Considerations
- Additional network hop introduces latency compared to sidecar.
- More complex infrastructure, requiring service discovery and load balancing.
- Potentially higher cost due to additional infrastructure components.
Use This Approach When
- A loosely coupled architecture and the ability to independently scale the AI service are desired.
- Multiple services or teams need to share the same AI engine.
- Heavy or varying traffic is anticipated, requiring independent scaling of the Spice Runtime.
- Resiliency and redundancy are prioritized over simplicity.
Example Use Case
A large organization where multiple services (recommendations, analytics, etc.) need to share AI insights. A centralized Spice Runtime microservice cluster helps separate teams consume AI outputs without duplicating efforts.