Observability & Monitoring
Spice can be monitored using the Spice Prometheus-compatible Metrics Endpoint.
Monitoring clients configuration:
Spice Metrics Endpoint Configuration​
The metrics endpoint uses port 9090
by default. The metrics endpoint configuration is logged at startup.
2024-11-28T19:48:10.942003Z INFO runtime::metrics_server: Spice Runtime Metrics listening on 127.0.0.1:9090
Pass the --metrics
parameter to bind to a specific port. For example, to bind to port 9091
:
spiced --metrics 0.0.0.0:9091
or when using Docker:
FROM spiceai/spiceai:latest
# Docker configuration ...
# Configure the metrics endpoint on port 9090
CMD ["--metrics", "0.0.0.0:9090"]
EXPOSE 9090
Configuration of the metrics endpoint can be verified using a HTTP GET request, for example:
curl http://localhost:9090/metrics
# HELP runtime_flight_server_started Indicates the runtime Flight server has started.
# TYPE runtime_flight_server_started counter
runtime_flight_server_started 1
# HELP runtime_http_server_started Indicates the runtime HTTP server has started.
# TYPE runtime_http_server_started counter
runtime_http_server_started 1
# HELP dataset_load_state Status of the dataset. 0=Initializing, 1=Ready, 2=Disabled, 3=Error, 4=Refreshing, 5=ShuttingDown.
# TYPE dataset_load_state gauge
dataset_load_state{dataset="taxi_trips"} 2
dataset_load_state{dataset="taxi_trips_accelerated"} 2
# HELP dataset_active_count Number of currently loaded datasets.
# TYPE dataset_active_count gauge
dataset_active_count{engine="None"} 1
dataset_active_count{engine="duckdb"} 1
...
Metrics​
| Metric | Description |
| ------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- | --- |
| accelerated_ready_state_federated_fallback
(count) | Number of times the federated table was queried due to the accelerated table loading the initial data. |
| catalog_load_errors
(count) | Number of errors loading the catalog provider. |
| catalog_load_state
(gauge) | Status of the catalog provider. 0=Initializing, 1=Ready, 2=Disabled, 3=Error, 4=Refreshing, 5=ShuttingDown. |
| dataset_acceleration_last_refresh_time_ms
(gauge) | Unix timestamp in seconds when the last refresh completed. |
| dataset_acceleration_refresh_duration_ms
(histogram) | Duration in milliseconds to load a full or appended refresh data. |
| dataset_acceleration_refresh_errors
(count) | Number of errors refreshing the dataset. |
| dataset_active_count
(gauge) | Number of currently loaded datasets. |
| dataset_load_state
(gauge) | Status of the dataset. 0=Initializing, 1=Ready, 2=Disabled, 3=Error, 4=Refreshing, 5=ShuttingDown. |
| dataset_unavailable_time_ms
(gauge) | Time dataset went offline in milliseconds. |
| embeddings_active_count
(gauge) | Number of currently loaded embeddings. |
| embeddings_load_errors
(count) | Number of errors loading the embedding. |
| embeddings_load_state
(gauge) | Status of the embedding. 0=Initializing, 1=Ready, 2=Disabled, 3=Error, 4=Refreshing, 5=ShuttingDown. |
| flight_request_duration_ms
(histogram) | Measures the duration of Flight requests in milliseconds. |
| flight_requests
(count) | Total number of Flight requests. |
| http_requests_duration_ms
(histogram) | Measures the duration of HTTP requests in milliseconds. |
| http_requests
(count) | Number of HTTP requests. | |
| llm_load_state
(gauge) | Status of the LLM model. 0=Initializing, 1=Ready, 2=Disabled, 3=Error, 4=Refreshing, 5=ShuttingDown. |
| model_active_count
(gauge) | Number of currently loaded models. |
| model_load_duration_ms
(histogram) | Duration in milliseconds to load the model. |
| model_load_errors
(count) | Number of errors loading the model. |
| model_load_state
(gauge) | Status of the model. 0=Initializing, 1=Ready, 2=Disabled, 3=Error, 4=Refreshing, 5=ShuttingDown. |
| query_duration_ms
(histogram) | The total amount of time spent planning and executing queries in milliseconds. |
| query_execution_duration_ms
(histogram) | The total amount of time spent only executing queries (0 for cached queries). |
| query_executions
(count) | Number of query executions. |
| query_failures
(count) | Number of query failures. |
| query_processed_bytes
(count) | Number of bytes processed by the runtime. |
| query_returned_bytes
(count) | Number of bytes returned to query clients. |
| results_cache_max_size_bytes
(gauge) | Maximum allowed size of the cache in bytes. |
| results_cache_requests
(count) | Number of requests to get a key from the cache. |
| results_cache_hits
(count) | Cache hit count. |
| results_cache_items_count
(gauge) | Number of items currently in the cache. |
| results_cache_size_bytes
(gauge) | Size of the cache in bytes. |
| search_results_cache_max_size_bytes
(gauge) | Maximum allowed size of the search cache in bytes. |
| search_results_cache_requests
(count) | Number of requests to get a key from the search cache. |
| search_results_cache_hits
(count) | Search cache hit count. |
| search_results_cache_items_count
(gauge) | Number of items currently in the search cache. |
| search_results_cache_size_bytes
(gauge) | Size of the search cache in bytes. |
| runtime_flight_server_started
(count) | Indicates the runtime Flight server has started. |
| runtime_http_server_started
(count) | Indicates the runtime HTTP server has started. |
| secrets_store_load_duration_ms
(histogram) | Duration in milliseconds to load the secret stores. |
| tool_active_count
(gauge) | Number of currently loaded LLM tools. |
| tool_load_errors
(count) | Number of errors loading the LLM tool. |
| tool_load_state
(gauge) | Status of the LLM tools. 0=Initializing, 1=Ready, 2=Disabled, 3=Error, 4=Refreshing, 5=ShuttingDown. |
| view_load_errors
(count) | Number of errors loading the view. |
| view_load_state
(gauge) | Status of the views. 0=Initializing, 1=Ready, 2=Disabled, 3=Error, 4=Refreshing, 5=ShuttingDown. |
In addition to these core metrics, individual components can expose their own metrics. For example, the MySQL data connector exposes connection pool metrics. See Component Metrics for more information.