Data Ingestion
Data can be ingested by the Spice runtime into a Data Connector using the following methods:
- SQL Statements – Write data directly to write-capable connectors using standard SQL syntax.
- OpenTelemetry (OTEL) Ingestion – Stream OTEL data for real-time processing.
SQL Statements​
Spice supports writing data to compatible data connectors using standard SQL INSERT INTO
syntax.
Write-Capable Connectors​
Data connectors that support write operations are tagged as write:
- Apache Iceberg - Write to Iceberg tables via data connector or catalog connector
- AWS Glue - Write to Glue Data Catalog tables via data connector or catalog connector
Configuration for Write Operations​
To enable write operations, configure your dataset or catalog with read_write access:
datasets:
- from: glue:my_catalog.my_schema.my_table
name: my_table
access: read_write
params:
# ... connector-specific parameters
Example SQL​
INSERT INTO my_table (column1, column2)
VALUES ('value1', 'value2');
INSERT INTO my_table (column1, column2)
SELECT source_column1, source_column2
FROM source_table
WHERE condition = 'filter';
For more details on the INSERT
statement syntax, see the SQL INSERT documentation.
OpenTelemetry Data Ingestion​
By default, the runtime exposes an OpenTelemetry (OTEL) endpoint at grpc://127.0.0.1:50052 for the OTEL data ingestion.
OTEL metrics will be inserted into datasets with matching names (metric name = dataset name) and optionally replicated to the dataset source.
Benefits​
Spice.ai OSS includes built-in data ingestion support, allowing the collection of the latest data from edge nodes for use in subsequent queries. This feature eliminates the need for additional ETL pipelines and enhances the speed of the feedback loop.
For example, consider CPU usage anomaly detection. When CPU metrics are sent to the Spice OpenTelemetry endpoint, the loaded machine learning model can use the most recent observations for inferencing and provide recommendations to the edge node. This process occurs quickly on the edge itself, within milliseconds, and without generating network traffic.
Additionally, Spice will periodically replicate the data to the data connector for further use.
Considerations​
Data Quality: Use Spice SQL capabilities to transform and cleanse ingested edge data, ensuring high-quality inputs.
Data Security: Evaluate data sensitivity and secure network connections between the edge and data connector when replicating data for further use. Implement encryption, access controls, and secure protocols.
Example​
Disk SMART​
Start Spice with the following dataset:
datasets:
- from: spice.ai/coolorg/smart/datasets/drive_stats
name: smart_attribute_raw_value
access: read_write
replication:
enabled: true
acceleration:
enabled: true
Start telegraf with the following config:
[[inputs.smart]]
attributes = true
[[outputs.opentelemetry]]
service_address = "localhost:50052"
[agent]
interval = "1s"
flush_interval = "1s"
SMART data will be available in the smart_attribute_raw_value
dataset in Spice.ai OSS and replicated to the coolorg.smart.drive_stats
dataset in Spice.ai Cloud.
Limitations​
- Write Support: Only selected write-capable connectors and catalogs support write operations.
- Only Spice.ai replication is supported for OpenTelemetry ingestion