Skip to main content

3 posts tagged with "snapshots"

Topics related to database and acceleration snapshots

View All Tags

Spice v1.10.2 (Dec 22, 2025)

Β· 5 min read
Sergei Grebnov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.10.2! πŸ”₯

v1.10.2 introduces Tiered Caching Acceleration with Localpod for multi-layer acceleration architectures, Periodic Acceleration Snapshots with configurable intervals, DynamoDB JSON Nesting for column consolidation, and Kafka/Debezium Batching for faster data ingestion. This release also includes fixes for SQLite accelerator decimal/date handling and real-time status reporting for the /v1/datasets and /v1/models API endpoints.

What's New in v1.10.2​

Tiered Caching with Localpod​

Multi-Layer Acceleration Architecture: The Localpod connector now supports caching refresh mode, enabling tiered acceleration where a persistent cache (e.g., file-mode DuckDB) feeds a fast in-memory cache (e.g., Arrow, memory-mode DuckDB).

Key Features:

  • Automatic Cache Propagation: New cache entries automatically propagate from parent to child accelerators
  • Warm Startup: Child accelerators initialize from existing parent data on startup, eliminating cold-start latency
  • Flexible Tiering: Combine any accelerator engines (DuckDB, SQLite, Cayenne) across tiers

Example spicepod.yaml configuration:

datasets:
# Parent: persistent file-mode cache
- from: https://api.example.com
name: api_cache
acceleration:
enabled: true
refresh_mode: caching
engine: duckdb
mode: file

# Child: fast in-memory cache fed by parent
- from: localpod:api_cache
name: api_cache_memory
acceleration:
enabled: true
refresh_mode: caching
engine: arrow
mode: memory

For more details, refer to the Localpod Data Connector Documentation.

Periodic Acceleration Snapshots​

Configurable Snapshot Intervals: A new snapshots_create_interval parameter enables periodic snapshot creation for accelerated datasets across all refresh modes. This provides better control over snapshot frequency and ensures consistent recovery points for accelerated data.

Example spicepod.yaml configuration:

datasets:
- from: s3://my-bucket/data.parquet
name: my_data
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_mode: caching
snapshots: enabled
params:
snapshots_create_interval: 60s # Write a snapshot every 60 seconds

For more details, refer to the Data Acceleration Documentation.

DynamoDB JSON Nesting​

Consolidate Columns into JSON: The DynamoDB Data Connector now supports consolidating columns into a single JSON column using the json_object: "*" metadata option. This is useful when only a few columns are needed as discrete fields while the rest can be accessed as nested JSON.

Example spicepod.yaml configuration:

datasets:
- from: dynamodb:my_table
name: my_table
columns:
- name: PK
- name: SK
- name: data_json
metadata:
json_object: '*' # Captures all other columns as JSON

Example Output: Given a DynamoDB table with columns PK, SK, name, email, and status, the resulting table schema consolidates all non-specified columns into the data_json column:

PKSKdata_json
pk_1sort_1{"name": "Alice", "email": "[email protected]", "status": "active"}
pk_2sort_2{"name": "Bob", "email": "[email protected]", "status": "inactive"}

For more details, refer to the DynamoDB JSON Nesting Documentation.

Kafka/Debezium Batching​

Faster Data Ingestion: Configure message batching for Kafka and Debezium connectors to improve data ingestion throughput. Batching reduces processing overhead by grouping multiple messages together before insertion.

Key Features:

  • Configurable Batch Size: Control the maximum number of records per batch (default: 10,000)
  • Configurable Batch Duration: Set the maximum wait time before flushing a partial batch (default: 1s)

Example spicepod.yaml configuration:

datasets:
- from: debezium:kafka-server.public.my_table
name: my_table
params:
batch_max_size: 10000 # Max records per batch (default: 10000)
batch_max_duration: 1s # Max wait time per batch (default: 1s)

For more details, refer to the Kafka Data Connector Documentation and Debezium Data Connector Documentation.

Additional Improvements & Bug Fixes​

  • Reliability: Fixed SQLite accelerator decimal and date type handling for improved data type accuracy.
  • Reliability: Fixed real-time status reporting for /v1/datasets and /v1/models API endpoints.
  • Reliability: Fixed Kafka warning when security.protocol is set to PLAINTEXT.

Contributors​

Breaking Changes​

No breaking changes.

Cookbook Updates​

New Cayenne Data Accelerator Recipe: New recipe demonstrating how to accelerate a local copy of the taxi trips dataset using Cayenne as the data accelerator engine. See Cayenne Data Accelerator Recipe for details.

New Dataset Partitioning Recipe: New recipe demonstrating how to partition accelerated datasets to improve query performance. See Dataset Partitioning for details.

The Spice Cookbook includes 84 recipes to help you get started with Spice quickly and easily.

Upgrading​

To upgrade to v1.10.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.10.2 image:

docker pull spiceai/spiceai:1.10.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

πŸŽ‰ Spice is now available in the AWS Marketplace!

What's Changed​

Changelog​

Spice v1.8.1 (Oct 13, 2025)

Β· 5 min read
Viktor Yershov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.8.1! πŸš€

Spice v1.8.1 is a patch release that adds Acceleration Snapshots Indexes, and includes a number of bug fixes and performance improvements.

What's New in v1.8.1​

Acceleration Snapshot Indexes​

  • Management of Acceleration Snapshots has been improved by adopting an Iceberg-inspired metadata.json, which now encodes pointer IDs, schema serialization, and robust checksum and size, which is validate before loading the snapshot.

  • Acceleration Snapshot Metrics: The following metrics are now available for Acceleration Snapshots:

  • dataset_acceleration_snapshot_bootstrap_duration_ms: The time it took the runtime to download the snapshot - only emitted when it initially downloads the snapshot.

  • dataset_acceleration_snapshot_bootstrap_bytes: The number of bytes downloaded to bootstrap the acceleration from the snapshot.

  • dataset_acceleration_snapshot_bootstrap_checksum: The checksum of the snapshot used to bootstrap the acceleration.

  • dataset_acceleration_snapshot_failure_count: Number of failures encountered when writing a new snapshot at the end of the refresh cycle. A snapshot failure does not prevent the refresh from completing.

  • dataset_acceleration_snapshot_write_timestamp: Unix timestamp in seconds when the last snapshot was completed.

  • dataset_acceleration_snapshot_write_duration_ms: The time it took to write the snapshot to object storage.

  • dataset_acceleration_snapshot_write_bytes: The number of bytes written on the last snapshot write.

  • dataset_acceleration_snapshot_write_checksum: The SHA256 checksum of the last snapshot write.

To learn more, see the Acceleration Snapshots Documentation and the Metrics Documentation.

Improved Regular Expression for DuckDB acceleration​

Regular expression support has been expanded when using DuckDB acceleration for functions like regexp-like and regexp_match.

For more details, refer to the SQL Reference for the list of available regular expression functions.

Additional Improvements & Bugfixes​

  • Reliability: Resolved an issue with partitioning on empty partition sets.
  • Validation: Added better validation for incorrectly configured Spicepods.
  • Reliability: Fixed partition_by accelerations when a projection is applied on empty partition sets.
  • Performance: Ensured ListingTable partitions are pruned when filters are not used.
  • Performance: Don't download acceleration snapshots if the acceleration is already present.
  • Performance: Refactored some blocking I/O and synchronization in the async codebase by moving operations to tokio::task::spawn_blocking, replacing blocking locks with async-friendly variants.
  • Bugfix: Nullable fields are now supported for S3 Vectors index columns.

Contributors​

Breaking Changes​

No breaking changes.

Cookbook Updates​

  • New Accelerated Snapshots Recipe - The recipe shows how to bootstrap DuckDB accelerations from object storage to skip cold starts.

The Spice Cookbook includes 81 recipes to help you get started with Spice quickly and easily.


Upgrading​

To upgrade to v1.8.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.8.1 image:

docker pull spiceai/spiceai:1.8.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

πŸŽ‰ Spice is now available in the AWS Marketplace!

What's Changed​

Changelog​

Spice v1.8.0 (Oct 6, 2025)

Β· 20 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.8.0! 🧊

Spice v1.8.0 delivers major advances in data writes, scalable vector search, and now in previewβ€”managed acceleration snapshots for fast cold starts. This release introduces write support for Iceberg tables using standard SQL INSERT INTO, partitioned S3 Vector indexes for petabyte-scale vector search, and preview of the AI SQL function for direct LLM integration in SQL. Additional improvements include improved reliability, and the v3.0.3 release of the Spice.js Node.js SDK.

What's New in v1.8.0​

Iceberg Table Write Support (Preview)​

Append Data to Iceberg Tables with SQL INSERT INTO: Spice now supports writing to Iceberg tables and catalogs using standard SQL INSERT INTO statements. This enables data ingestion, transformation, and pipeline use casesβ€”no Spark or external writer required.

  • Append-only: Initial version targets appends; no overwrite or delete.
  • Schema validation: Inserted data must match the target table schema.
  • Secure by default: Writes are only enabled for datasets or catalogs explicitly marked with access: read_write.

Example Spicepod configuration:

catalogs:
- from: iceberg:https://glue.ap-northeast-3.amazonaws.com/iceberg/v1/catalogs/111111/namespaces
name: ice
access: read_write

datasets:
- from: iceberg:https://iceberg-catalog-host.com/v1/namespaces/my_namespace/tables/my_table
name: iceberg_table
access: read_write

Example SQL usage:

-- Insert from another table
INSERT INTO iceberg_table
SELECT * FROM existing_table;

-- Insert with values
INSERT INTO iceberg_table (id, name, amount)
VALUES (1, 'John', 100.0), (2, 'Jane', 200.0);

-- Insert into catalog table
INSERT INTO ice.sales.transactions
VALUES (1001, '2025-01-15', 299.99, 'completed');

Note: Only Iceberg datasets and catalogs with access: read_write support writes. Internal Spice tables and other connectors remain read-only.

Learn more in the Iceberg Data Connector documentation.

Acceleration Snapshots for Fast Cold Starts (Preview)​

Bootstrap Managed Accelerations from Object Storage: Spice now supports managed acceleration snapshots in preview, enabling datasets accelerated with file-based engines (DuckDB or SQLite) to bootstrap from a snapshot stored in object storage (such as S3) if the local acceleration file does not exist on startup. This dramatically reduces cold start times and enables ephemeral storage for accelerations with persistent recovery.

Key features:

  • Rapid readiness: Datasets can become ready in seconds by downloading a pre-built snapshot, skipping lengthy initial acceleration.
  • Hive-style partitioning: Snapshots are organized by month, day, and dataset for easy retention and management.
  • Flexible bootstrapping: Configurable fallback and retry behavior if a snapshot is missing or corrupted.

Example Spicepod configuration:

snapshots:
enabled: true
location: s3://some_bucket/some_folder/ # Folder for storing snapshots
bootstrap_on_failure_behavior: warn # Options: warn, retry, fallback
params:
s3_auth: iam_role # All S3 dataset params accepted here

datasets:
- from: s3://some_bucket/some_table/
name: some_table
params:
file_format: parquet
s3_auth: iam_role
acceleration:
enabled: true
snapshots: enabled # Options: enabled, disabled, bootstrap_only, create_only
engine: duckdb
mode: file
params:
duckdb_file: /nvme/some_table.db

How it works:

  • On startup, if the acceleration file does not exist, Spice checks the snapshot location for the latest snapshot and downloads it.
  • Snapshots are stored as: s3://some_bucket/some_folder/month=2025-09/day=2025-09-30/dataset=some_table/some_table_<timestamp>.db
  • If no snapshot is found, a new acceleration file is created as usual.
  • Snapshots are written after each refresh (unless configured otherwise).

Supported snapshot modes:

  • enabled: Download and write snapshots.
  • bootstrap_only: Only download on startup, do not write new snapshots.
  • create_only: Only write snapshots, do not download on startup.
  • disabled: No snapshotting.

Note: This feature is only supported for file-based accelerations (DuckDB or SQLite) with dedicated files.

Why use acceleration snapshots?

  • Faster cold starts: Skip waiting for full acceleration on startup.
  • Ephemeral storage: Use fast local disks (e.g., NVMe) for acceleration, with persistent recovery from object storage.
  • Disaster recovery: Recover from federated source outages by bootstrapping from the latest snapshot.

Partitioned S3 Vector Indexes​

Efficient, Scalable Vector Search with Partitioning: Spice now supports partitioning Amazon S3 Vector indexes and scatter-gather queries using a partition_by expression in the dataset vector engine configuration. Partitioned indexes enable faster ingestion, lower query latency, and scale to billions of vectors.

Example Spicepod configuration:

datasets:
- name: reviews
vectors:
enabled: true
engine: s3_vectors
params:
s3_vectors_bucket: my-bucket
s3_vectors_index: base-embeddings
partition_by:
- 'bucket(50, PULocationID)'
columns:
- name: body
embeddings:
from: bedrock_titan
- name: title
embeddings:
from: bedrock_titan

See the Amazon S3 Vectors documentation for details.

AI SQL function for LLM Integration (Preview)​

LLMs Directly In SQL: A new asynchronous ai SQL function enables direct calls to LLMs from SQL queries for text generation, translation, classification, and more. This feature is released in preview and supports both default and model-specific invocation.

Example Spicepod model configuration:

models:
- name: gpt-4o
from: openai:gpt-4o
params:
openai_api_key: ${secrets:openai_key}

Example SQL usage:

-- basic usage with default model
SELECT ai('hi, this prompt is directly from SQL.');
-- basic usage with specified model
SELECT ai('hi, this prompt is directly from SQL.', 'gpt-4o');
-- Using row data as input to the prompt
SELECT ai(concat_ws(' ', 'Categorize the zone', Zone, 'in a single word. Only return the word.')) AS category
FROM taxi_zones
LIMIT 10;

Learn more in the SQL Reference AI documentation.

Remote Endpoint Support for Spice CLI​

Run CLI Commands Remotely: The Spice CLI now supports connecting to remote Spice instances, enabling you to run spice sql, spice search, and spice chat commands from your local machine against a remote spiced daemon or to Spice Cloud. Previously, these commands required running on the same machine as the runtime. Now, new flags allow remote execution:

  • --cloud: Connect to a Spice Cloud instance (requires --api-key).
  • --endpoint <endpoint>: Connect to a remote Spice instance via HTTP or Arrow Flight SQL (gRPC). Supports http://, https://, grpc://, or grpc+tls:// schemes.

Examples:

# Run SQL queries against a remote Spice instance
spice sql --endpoint http://remote-host:8090

# Use Spice Cloud for chat or search
spice chat --cloud --api-key <your-api-key>
spice search --cloud --api-key <your-api-key>

Supported CLI Commands:

  • spice sql --cloud / spice sql --endpoint <endpoint>
  • spice search --cloud / spice search --endpoint <endpoint>
  • spice chat --cloud / spice chat --endpoint <endpoint>

Additional Flags:

  • --headers: Pass custom HTTP headers to the remote endpoint.
  • --tls-root-certificate-file: Specify a root certificate for TLS verification.
  • --user-agent: Set a custom user agent for requests.

For more details, see the Spice CLI Command Reference.

Spice.js v3.0.3 SDK​

Spice.js v3.0.3 Released: The official Spice.ai Node.js/JavaScript SDK has been updated to v3.0.3, bringing cross-platform support, new APIs, and improved reliability for both Node.js and browser environments.

  • Modern Query Methods: Use sql(), sqlJson(), and nsql() for flexible querying, streaming, and natural language to SQL.
  • Browser Support: SDK now works in browsers and web applications, automatically selecting the optimal transport (gRPC or HTTP).
  • Health Checks & Dataset Refresh: Easily monitor Spice runtime health and trigger dataset refreshes on demand.
  • Automatic HTTP Fallback: If gRPC/Flight is unavailable, the SDK falls back to HTTP automatically.
  • Migration Guidance: v3 requires Node.js 20+, uses camelCase parameters, and introduces a new package structure.

Example usage:

import { SpiceClient } from '@spiceai/spice'

const client = new SpiceClient(apiKey)
const table = await client.sql('SELECT * FROM my_table LIMIT 10')
console.table(table.toArray())

See Spice.js SDK documentation for full details, migration tips, and advanced usage.

Additional Improvements​

  • Reliability: Improved logging, error handling, and network readiness checks across connectors (Iceberg, Databricks, etc.).
  • Vector search durability and scale: Refined logging, stricter default limits, safeguards against index-only scans and duplicate results, and always-accessible metadata for robust queryability at scale.
  • Cache behavior: Tightened cache logic for modification queries.
  • Full-Text Search: FTS metadata columns now usable in projections; max search results increased to 1000.
  • RRF Hybrid Search: Reciprocal Rank Fusion (RRF) UDTF enhancements for advanced hybrid search scenarios.

Contributors​

Breaking Changes​

This release introduces two breaking changes associated with the search observability and tooling.

Firstly, the document_similarity tool has been renamed to search. This has the equivalent change to tracing of these tool calls:

## Old: v1.7.1
>> spice trace tool_use::document_similarity
>> curl -XPOST http://localhost:8090/v1/tools/document_similarity \
-d '{
"datasets": ["my_tbl"],
"text": "Welcome to another Spice release"
}'

## New: v1.8.0
>> spice trace tool_use::search
>> curl -XPOST http://localhost:8090/v1/tools/search \
-d '{
"datasets": ["my_tbl"],
"text": "Welcome to another Spice release"
}'

Secondly, the vector_search task in runtime.task_history has been renamed to search.

Cookbook Updates​

The Spice Cookbook now includes 80 recipes to help you get started with Spice quickly and easily.


Upgrading​

To upgrade to v1.8.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.8.0 image:

docker pull spiceai/spiceai:1.8.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

πŸŽ‰ Spice is now available in the AWS Marketplace!

What's Changed​

Dependencies​

  • iceberg-rust: Upgraded to v0.7.0-rc.1
  • mimalloc: Upgraded from 0.1.47 to 0.1.48
  • azure_core: Upgraded from 0.27.0 to 0.28.0
  • Jimver/cuda-toolkit: Upgraded from 0.2.27 to 0.2.28

Changelog​