Spicepods
Overview
A Spicepod is a configuration package that defines application-specific datasets, catalogs, machine learning (ML) models, and secrets. It functions similarly to a code packaging system (such as npm or pip), but is designed for data and AI components rather than code libraries.
Spicepods are defined in a YAML manifest file, typically named spicepod.yaml, and can be shared, versioned, and reused across projects.
Structure
A Spicepod is described by a YAML manifest file, typically named spicepod.yaml, which includes the following key sections:
- Metadata: Basic information about the Spicepod, such as its name and version.
- Datasets: Definitions of datasets that are used or produced within the Spicepod.
- Catalogs: Definitions of catalogs that are used within the Spicepod.
- Models: Definitions of language or traditional ML models that the Spicepod manages, including their sources and associated datasets.
- Secrets: Configuration for any secret stores used within the Spicepod.
Example Manifest
version: v1
kind: Spicepod
name: my_spicepod
datasets:
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
acceleration:
enabled: true
models:
- from: openai:gpt-4o-mini
name: openai_model
params:
openai_api_key: ${ env:OPENAI_API_KEY }
tools: auto
secrets:
- from: env
name: env
Additional Example
version: v1
kind: Spicepod
name: another_spicepod
datasets:
- from: databricks:spiceai_demo.public.dataset
name: sample_ds
params:
mode: delta_lake
databricks_endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com
databricks_token: ${secrets:my_token}
databricks_aws_access_key_id: ${secrets:aws_access_key_id}
databricks_aws_secret_access_key: ${secrets:aws_secret_access_key}
acceleration:
enabled: true
refresh_mode: full
models:
- from: huggingface.co/microsoft/Phi-3.5-mini-instruct
name: phi
secrets:
- from: env
name: env
Key Components
Datasets
Datasets in a Spicepod can be sourced from various locations, including local files or remote databases. They can be materialized and accelerated using different engines such as DuckDB, SQLite, or PostgreSQL to optimize performance.
Learn more at Datasets.
Catalogs
Catalogs in a Spicepod can contain multiple schemas. Each schema, in turn, contains multiple tables where the actual data is stored.
Learn more at Catalogs.
Models
ML models are integrated into the Spicepod similarly to datasets. The models can be specified using paths to local files or remote locations. ML inference can be performed using the models and datasets defined within the Spicepod.
Learn more at Models.
Secrets
Spice.ai supports various secret stores to manage sensitive information such as API keys or database credentials. Supported secret store types include environment variables, files, AWS Secrets Manager, Kubernetes secrets, and keyrings.
Learn more at Secret Stores
