Skip to main content

16 posts tagged with "duckdb"

DuckDB database topics and usage

View All Tags

Spice v1.5.0 (July 21, 2025)

· 14 min read
Evgenii Khramkov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.5.0! 🔍

Spice v1.5.0 brings major upgrades to search and retrieval. It introduces native support for Amazon S3 Vectors, enabling petabyte scale vector search directly from S3 vector buckets, alongside SQL-integrated vector and tantivy-powered full-text search, partitioning for DuckDB acceleration, and automated refreshes for search indexes and views. It includes the AWS Bedrock Embeddings Model Provider, the Oracle Database connector, and the now-stable Spice.ai Cloud Data Connector, and the upgrade to DuckDB v1.3.2.

What's New in v1.5.0

Amazon S3 Vectors Support: Spice.ai now integrates with Amazon S3 Vectors, launched in public preview on July 15, 2025, enabling vector-native object storage with built-in indexing and querying. This integration supports semantic search, recommendation systems, and retrieval-augmented generation (RAG) at petabyte scale with S3’s durability and elasticity. Spice.ai manages the vector lifecycle—ingesting data, creating embeddings with models like Amazon Titan or Cohere via AWS Bedrock, or others available on HuggingFace, and storing it in S3 Vector buckets.

Spice integration with Amazon S3 Vectors

Example Spicepod.yml configuration for S3 Vectors:

datasets:
- from: s3://my_data_bucket/data/
name: my_vectors
params:
file_format: parquet
acceleration:
enabled: true
vectors:
engine: s3_vectors
params:
s3_vectors_aws_region: us-east-2
s3_vectors_bucket: my-s3-vectors-bucket
columns:
- name: content
embeddings:
- from: bedrock_titan
row_id:
- id

Example SQL query using S3 Vectors:

SELECT *
FROM vector_search(my_vectors, 'Cricket bats', 10)
WHERE price < 100
ORDER BY score

For more details, refer to the S3 Vectors Documentation.

SQL-integrated Search: Vector and BM25-scored full-text search capabilities are now natively available in SQL queries, extending the power of the POST v1/search endpoint to all SQL workflows.

Example Vector-Similarity-Search (VSS) using the vector_search UDTF on the table reviews for the search term "Cricket bats":

SELECT review_id, review_text, review_date, score
FROM vector_search(reviews, "Cricket bats")
WHERE country_code="AUS"
LIMIT 3

Example Full-Text-Search (FTS) using the text_search UDTF on the table reviews for the search term "Cricket bats":

SELECT review_id, review_text, review_date, score
FROM text_search(reviews, "Cricket bats")
LIMIT 3

DuckDB v1.3.2 Upgrade: Upgraded DuckDB engine from v1.1.3 to v1.3.2. Key improvements include support for adding primary keys to existing tables, resolution of over-eager unique constraint checking for smoother inserts, and 13% reduced runtime on TPC-H SF100 queries through extensive optimizer refinements. The v1.2.x release of DuckDB was skipped due to a regression in indexes.

Partitioned Acceleration: DuckDB file-based accelerations now support partition_by expressions, enabling queries to scale to large datasets through automatic data partitioning and query predicate pruning. New UDFs, bucket and truncate, simplify partition logic.

New UDFs useful for partition_by expressions:

  • bucket(num_buckets, col): Partitions a column into a specified number of buckets based on a hash of the column value.
  • truncate(width, col): Truncates a column to a specified width, aligning values to the nearest lower multiple (e.g., truncate(10, 101) = 100).

Example Spicepod.yml configuration:

datasets:
- from: s3://my_bucket/some_large_table/
name: my_table
params:
file_format: parquet
acceleration:
enabled: true
engine: duckdb
mode: file
partition_by: bucket(100, account_id) # Partition account_id into 100 buckets

Full-Text-Search (FTS) Index Refresh: Accelerated datasets with search indexes maintain up-to-date results with configurable refresh intervals.

Example refreshing search indexes on body every 10 seconds:

datasets:
- from: github:github.com/spiceai/docs/pulls
name: spiceai.doc.pulls
params:
github_token: ${secrets:GITHUB_TOKEN}
acceleration:
enabled: true
refresh_mode: full
refresh_check_interval: 10s
columns:
- name: body
full_text_search:
enabled: true
row_id:
- id

Scheduled View Refresh: Accelerated Views now support cron-based refresh schedules using refresh_cron, automating updates for accelerated data.

Example Spicepod.yml configuration:

views:
- name: my_view
sql: SELECT 1
acceleration:
enabled: true
refresh_cron: '0 * * * *' # Every hour

For more details, refer to Scheduled Refreshes.

Multi-column Vector Search: For datasets configured with embeddings on more than one column, POST v1/search and similarity_search perform parallel vector search on each column, aggregating results using reciprocal rank fusion.

Example Spicepod.yml for multi-column search:

datasets:
- from: github:github.com/apache/datafusion/issues
name: datafusion.issues
params:
github_token: ${secrets:GITHUB_TOKEN}
columns:
- name: title
embeddings:
- from: hf_minilm
- name: body
embeddings:
- from: openai_embeddings

AWS Bedrock Embeddings Model Provider: Added support for AWS Bedrock embedding models, including Amazon Titan Text Embeddings and Cohere Text Embeddings.

Example Spicepod.yml:

embeddings:
- from: bedrock:cohere.embed-english-v3
name: cohere-embeddings
params:
aws_region: us-east-1
input_type: search_document
truncate: END
- from: bedrock:amazon.titan-embed-text-v2:0
name: titan-embeddings
params:
aws_region: us-east-1
dimensions: '256'

For more details, refer to the AWS Bedrock Embedding Models Documentation.

Oracle Data Connector: Use from: oracle: to access and accelerate data stored in Oracle databases, deployed on-premises or in the cloud.

Example Spicepod.yml:

datasets:
- from: oracle:"SH"."PRODUCTS"
name: products
params:
oracle_host: 127.0.0.1
oracle_username: scott
oracle_password: tiger

See the Oracle Data Connector documentation.

GitHub Data Connector: The GitHub data connector supports query and acceleration of members, the users of an organization.

Example Spicepod.yml configuration:

datasets:
- from: github:github.com/spiceai/members # General format: github.com/[org-name]/members
name: spiceai.members
params:
# With GitHub Apps (recommended)
github_client_id: ${secrets:GITHUB_SPICEHQ_CLIENT_ID}
github_private_key: ${secrets:GITHUB_SPICEHQ_PRIVATE_KEY}
github_installation_id: ${secrets:GITHUB_SPICEHQ_INSTALLATION_ID}
# With GitHub Tokens
# github_token: ${secrets:GITHUB_TOKEN}

See the [GitHub Data Connector Documentation]

Spice.ai Cloud Data Connector: Graduated to Stable.

spice-rs SDK Release: The Spice Rust SDK has updated to v3.0.0. This release includes optimizations for the Spice client API, adds robust query retries, and custom metadata configurations for spice queries.

Contributors

Breaking Changes

  • Search HTTP API Response: POST v1/search response payload has changed. See the new API documentation for details.
  • Model Provider Parameter Prefixes: Model Provider parameters use provider-specific prefixes instead of openai_ prefixes (e.g., hf_temperature for HuggingFace, anthropic_max_completion_tokens for Anthropic, perplexity_tool_choice for Perplexity). The openai_ prefix remains supported for backward compatibility but is deprecated and will be removed in a future release.

Cookbook Updates

The Spice Cookbook now includes 72 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.5.0, download and install the specific binary from github.com/spiceai/spiceai/releases/tag/v1.5.0 or pull the v1.5.0 Docker image (spiceai/spiceai:1.5.0).

What's Changed

Dependencies

Changelog

  • fix: openai model endpoint (#6394) by @Sevenannn in #6394
  • Enable configuring otel endpoint from spice run (#6360) by @Advayp in #6360
  • Enable Oracle connector in default build configuration (#6395) by @sgrebnov in #6395
  • fix llm integraion test (#6398) by @Sevenannn in #6398
  • Promote spice cloud connector to stable quality (#6221) by @Sevenannn in #6221
  • v1.5.0-rc.1 release notes (#6397) by @lukekim in #6397
  • Fix model nsql integration tests (#6365) by @Sevenannn in #6365
  • Fix incorrect UDTF name and SQL query (#6404) by @lukekim in #6404
  • Update v1.5.0-rc.1.md (#6407) by @sgrebnov in #6407
  • Improve error messages (#6405) by @lukekim in #6405
  • build(deps): bump Jimver/cuda-toolkit from 0.2.25 to 0.2.26 (#6388) by @app/dependabot in #6388
  • Upgrade dependabot dependencies (#6411) by @phillipleblanc in #6411
  • Fix projection pushdown issues for document based file connector (#6362) by @Advayp in #6362
  • Add a PartitionedDuckDB Accelerator (#6338) by @kczimm in #6338
  • Use vector_search() UDTF in HTTP APIs (#6417) by @Jeadie in #6417
  • add supported types (#6409) by @kczimm in #6409
  • Enable session time zone override for MySQL (#6426) by @sgrebnov in #6426
  • Acceleration-like indexing for full text search indexes. (#6382) by @Jeadie in #6382
  • Provide error message when partition by expression changes (#6415) by @kczimm in #6415
  • Add support for Oracle Autonomous Database connections (Oracle Cloud) (#6421) by @sgrebnov in #6421
  • prune partitions for exact and in list with and without UDFs (#6423) by @kczimm in #6423
  • Fixes and reenable FTS tests (#6431) by @Jeadie in #6431
  • Upgrade DuckDB to 1.3.2 (#6434) by @phillipleblanc in #6434
  • Fix issue in limit clause for the Github Data connector (#6443) by @Advayp in #6443
  • Upgrade iceberg-rust to 0.5.1 (#6446) by @phillipleblanc in #6446
  • v1.5.0-rc.2 release notes (#6440) by @lukekim in #6440
  • Oracle: add automated TPC-H SF1 benchmark tests (#6449) by @sgrebnov in #6449
  • fix: Update benchmark snapshots (#6455) by @app/github-actions in #6455
  • Preserve ArrowError in arrow_tools::record_batch (#6454) by @mach-kernel in #6454
  • fix: Update benchmark snapshots (#6465) by @app/github-actions in #6465
  • Add option to preinstall Oracle ODPI-C library in Docker image (#6466) by @sgrebnov in #6466
  • Include Oracle connector (federated mode) in automated benchmarks (#6467) by @sgrebnov in #6467
  • Update crates/llms/src/bedrock/embed/mod.rs by @lukekim in #6468
  • v1.5.0-rc.3 release notes (#6474) by @lukekim in #6474
  • Add integration tests for S3 Vectors filters pushdown (#6469) by @sgrebnov in #6469
  • check for indexedtableprovider when finding tables to search on (#6478) by @Jeadie in #6478
  • Parse fully qualified table names in UDTFs (#6461) by @Jeadie in #6461
  • Add integration test for S3 Vectors to cover data update (overwrite) (#6480) by @sgrebnov in #6480
  • Add 'Run all tests' option for models tests and enable Bedrock tests (#6481) by @sgrebnov in #6481
  • Add support for a members table type for the GitHub Data Connector (#6464) by @Advayp in #6464
  • S3 vector data cannot be null (#6483) by @Jeadie in #6483
  • Don't infer FixedSizeList size during indexing vectors. (#6487) by @Jeadie in #6487
  • Add support for retention_sql acceleration param (#6488) by @sgrebnov in #6488
  • Make dataset refresh progress tracing less verbose (#6489) by @sgrebnov in #6489
  • Use RwLock on tantivy index in FullTextDatabaseIndex for update concurrency (#6490) by @Jeadie in #6490
  • Add tests for dataset retention logic and refactor retention code (#6495) by @sgrebnov in #6495
  • Upgade dependabot dependencies (#6497) by @phillipleblanc in #6497
  • Add periodic tracing of data loading progress during dataset refresh (#6499) by @sgrebnov in #6499
  • Promote Oracle Data Connector to Alpha (#6503) by @sgrebnov in #6503
  • Use AWS SDK to provide credentials for Iceberg connectors (#6498) by @phillipleblanc in #6498
  • Add integration tests for partitioning (#6463) by @kczimm in #6463
  • Use top-level table in full-text search JOIN ON (#6491) by @Jeadie in #6491
  • Use accelerated table in vector_search JOIN operations when appropriate (#6516) by @Jeadie in #6516
  • Fix 'additional_column' for quoted columns (fix for qualified columns broke it) (#6512) by @Jeadie in #6512
  • Also use AWS SDK for inferring credentials for S3/Delta/Databricks Delta data connectors (#6504) by @phillipleblanc in #6504
  • Add per-dataset availability monitor configuration (#6482) by @phillipleblanc in #6482
  • Suppress the warning from the AWS SDK if it can't load credentials (#6533) by @phillipleblanc in #6533
  • Change default value of check_availability from default to auto (#6534) by @lukekim in #6534
  • README.md improvements for v1.5.0 (#6539) by @lukekim in #6539
  • Temporary disable s3_vectors_basic (#6537) by @sgrebnov in #6537
  • Ensure binder errors show before query and other (#6374) by @suhuruli in #6374
  • Update spiceai/duckdb-rs -> DuckDB 1.3.2 + index fix (#6496) by @mach-kernel in #6496
  • Update table-providers to latest version with DuckDB fixes (#6535) by @phillipleblanc in #6535
  • S3: default to public access if no auth is provided (#6532) by @sgrebnov in #6532

Spice v1.3.2 (June 2, 2025)

· 2 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.3.2! ❄️

Spice v1.3.2 is a patch release with fixes to the DuckDB data accelerator and Snowflake data connector.

Changes:

  • DuckDB Data Accelerator: Supports ORDER BY rand() for randomized result ordering and ORDER BY NULL for SQL compatibility.

  • Snowflake Data Connector: Adds TIMESTAMP_NTZ(0) type for timestamps with seconds precision.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No new cookbook recipes.

The Spice Cookbook now includes 67 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.3.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.3.2 image:

docker pull spiceai/spiceai:1.3.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

No major dependency changes.

Changelog

  • Handle Snowflake Timestamp NTZ with seconds precision (#6084) by @kczimm in #6084
  • Fix DuckDB acceleration ORDER BY rand() and ORDER BY NULL (#6071) by @phillipleblanc in #6071

Full Changelog: https://github.com/spiceai/spiceai/compare/v1.3.1...v1.3.2

Spice v1.3.0 (May 19, 2025)

· 9 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.3.0! 🏎️

Spice v1.3.0 accelerates data and AI applications with significantly improved query performance, reliability, and expanded Databricks integration. New support for the Databricks SQL Statement Execution API enables direct SQL queries on Databricks SQL Warehouses, complementing Mosaic AI model serving and embeddings (introduced in v1.2.2) and existing Databricks catalog and dataset integrations. This release upgrades to DataFusion v46, optimizes results caching performance, and strengthens security with least-privilege sandboxed improvements.

What's New in v1.3.0

  • Databricks SQL Statement Execution API Support: Added support for the Databricks SQL Statement Execution API, enabling direct SQL queries against Databricks SQL Warehouses for optimized performance in analytics and reporting workflows.

    Example spicepod.yml configuration:

    datasets:
    - from: databricks:spiceai.datasets.my_awesome_table
    name: my_awesome_table
    params:
    mode: sql_warehouse
    databricks_endpoint: ${env:DATABRICKS_ENDPOINT}
    databricks_sql_warehouse_id: ${env:DATABRICKS_SQL_WAREHOUSE_ID}
    databricks_token: ${env:DATABRICKS_TOKEN}

    For details, see the Databricks Data Connector documentation.

  • Improved Results Cache Performance & Hashing Algorithm: Spice now supports an alternative results cache hashing algorithm, ahash, in addition to siphash, being the default. Configure it via:

    runtime:
    results_cache:
    hashing_algorithm: ahash # or siphash

    The hashing algorithm determines how cache keys are hashed before being stored, impacting both lookup speed and protection against potential DOS attacks.

    Using ahash improves performance for large queries or query plans. Combined with results cache optimizations, it reduces 99th percentile request latency and increases total requests/second for queries with large result sets (100k+ cached rows). The following charts show performance tested against the TPCH Query #17 on a scale factor 5 dataset (30+ million rows, 5GB):

    LatencyReq/sec
    Improvements for the 99th percentile query latency, compared against 1.2.2 with cache key type and hashing algorithm.Improvements for the requests/second, compared against 1.2.2 with cache key type and hashing algorithm.

    Note: ahash was not available in v1.2.2, so it is excluded from comparisons.

    To learn more, refer to the Results Cache Hashing Algorithm documentation.

  • SQL Query Performance: Optimized the critical SQL query path, reducing overhead and improving response times for simple queries by 10-20%.

  • DuckDB Acceleration: Fixed a bug in the DuckDB acceleration engine causing query failures under high concurrency when querying datasets accelerated into multiple DuckDB files.

  • Container Security: The container image now runs as a non-root user with enhanced sandboxing and includes only essential dependencies for a slimmer, more secure image.

DataFusion v46 Highlights

Spice.ai is built on the DataFusion query engine. The v46 release brings:

  • Faster Performance 🚀: DataFusion 46 introduces significant performance enhancements, including a 2x faster median() function for large datasets without grouping, 10–100% speed improvements in FIRST_VALUE and LAST_VALUE window functions by avoiding sorting, and a 40x faster uuid() function. Additional optimizations, such as a 50% faster repeat() string function, accelerated chr() and to_hex() functions, improved grouping algorithms, and Parquet row group pruning with NOT LIKE filters, further boost overall query efficiency.

  • New range() Table Function: A new table-valued function range(start, stop, step) has been added to make it easy to generate integer sequences — similar to PostgreSQL’s generate_series() or Spark’s range(). Example: SELECT * FROM range(1, 10, 2);

  • UNION [ALL | DISTINCT] BY NAME Support: DataFusion now supports UNION BY NAME and UNION ALL BY NAME, which align columns by name instead of position. This matches functionality found in systems like Spark and DuckDB and simplifies combining heterogeneously ordered result sets.

    Example:

    SELECT col1, col2 FROM t1
    UNION ALL BY NAME
    SELECT col2, col1 FROM t2;

See the DataFusion 46.0.0 release notes for details.

Spice.ai adopts the latest minus one DataFusion release for quality assurance and stability. The upgrade to DataFusion v47 is planned for Spice v1.4.0 in June.

Contributors

Breaking Changes

The container image now always runs as a non-root user (UID/GID 65534) with minimal dependencies, resulting in a smaller, more secure image. Standard Linux tools, including bash, are no longer included.

Kubernetes Deployments:

  • Use of the v1.3.0+ Helm chart is required, which includes a securityContext ensuring the sandbox user has required file access.

  • For deployments using a lower version than the v1.3.0 Helm chart, add the following securityContext to the pod specification:

securityContext:
runAsUser: 65534
runAsGroup: 65534
fsGroup: 65534

See the Docker Sandbox Guide for details on how to update custom Docker images to restore the previous behavior.

Cookbook Updates

  • Added Accelerated Views: Pre-calculate and materialize data derived from one or more underlying datasets.

The Spice Cookbook now includes 67 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.3.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.3.0 image:

docker pull spiceai/spiceai:1.3.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

Changelog

See the full list of changes at: v1.2.2...v1.3.0

Spice v1.1.0 (Mar 31, 2025)

· 20 min read
Luke Kim
Founder and CEO of Spice AI

Model-Context-Protocol (MCP) support in Spice.ai Open Source

Announcing the release of Spice v1.1.0! 🤖

Spice v1.1.0 introduces full support for the Model-Context-Protocol (MCP), expanding how models and tools connect. Spice can now act as both an MCP Server, with the new /v1/mcp/sse API, and an MCP Client, supporting stdio and SSE-based servers. This release also introduces a new Web Search tool with Perplexity model support, advanced evaluation workflows with custom eval scorers, including LLM-as-a-judge, and adds an IMAP Data Connector for federated SQL queries across email servers. Alongside these features, v1.1.0 includes automatic NSQL query retries, expanded task tracing, request drains for HTTP server shutdowns, delivering improved reliability, flexibility, and observability.

Highlights in v1.1.0

  • Spice as an MCP Server and Client: Spice now supports the Model Context Protocol (MCP), for expanded tool discovery and connectivity. Spice can:

    1. Run stdio-based MCP servers internally.
    2. Connect to external MCP servers over SSE protocol (Streamable HTTP is coming soon!)

    For more details, see the MCP documentation.

    Usage

    tools:
    - name: google_maps
    from: mcp:npx
    params:
    mcp_args: -y @modelcontextprotocol/server-google-maps

    Spice as an MCP Server

    Tools in Spice can be accessed via MCP. For example, connecting from an IDE like Cursor or Windsurf to Spice. Set the MCP Server URL to http://localhost:8090/v1/mcp/sse.

  • Perplexity Model Support: Spice now supports Perplexity-hosted models, enabling advanced web search and retrieval capabilities. Example configuration:

    models:
    - name: webs
    from: perplexity:sonar
    params:
    perplexity_auth_token: ${ secrets:SPICE_PERPLEXITY_AUTH_TOKEN }
    perplexity_search_domain_filter:
    - docs.spiceai.org
    - huggingface.co

    For more details, see the Perplexity documentation.

  • Web Search Tool: The new Web Search Tool enables Spice models to search the web for information using search engines like Perplexity. Example configuration:

    tools:
    - name: the_internet
    from: websearch
    description: 'Search the web for information.'
    params:
    engine: perplexity
    perplexity_auth_token: ${ secrets:SPICE_PERPLEXITY_AUTH_TOKEN }

    For more details, see the Web Search Tool documentation.

  • Eval Scorers: Eval scorers assess model performance on evaluation cases. Spice includes built-in scorers:

    • match: Exact match.
    • json_match: JSON equivalence.
    • includes: Checks if actual output includes expected output.
    • fuzzy_match: Normalized subset matching.
    • levenshtein: Levenshtein distance.

    Custom scorers can use embedding models or LLMs as judges. Example:

    evals:
    - name: australia
    dataset: cricket_questions
    scorers:
    - hf_minilm
    - judge
    - match
    embeddings:
    - name: hf_minilm
    from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2
    models:
    - name: judge
    from: openai:gpt-4o
    params:
    openai_api_key: ${ secrets:OPENAI_API_KEY }
    system_prompt: |
    Compare these stories and score their similarity (0.0 to 1.0).
    Story A: {{ .actual }}
    Story B: {{ .ideal }}

    For more details, see the Eval Scorers documentation.

  • IMAP Data Connector: Query emails stored in IMAP servers using federated SQL. Example:

    datasets:
    - from: imap:[email protected]
    name: emails
    params:
    imap_access_token: ${secrets:IMAP_ACCESS_TOKEN}

    For more details, see the IMAP Data Connector documentation.

  • Automatic NSQL Query Retries: Failed NSQL queries are now automatically retried, improving reliability for federated queries. For more details, see the NSQL documentation.

  • Enhanced Task Tracing: Task history now includes chat completion IDs, and runtime readiness is traced for better observability. Use the runtime.task_history table to query task details. See the Task History documentation.

  • Vector Search with Keyword Filtering: The vector search API now includes an optional list of keywords as a parameter, to pre-filter SQL results before performing a vector search. When vector searching via a chat completion, models will automatically generate keywords relevant to the search. See the Vector Search API documentation.

  • Improved Refresh Behavior on Startup: Spice won't automatically refresh an accelerated dataset on startup if it doesn't need to. See the Refresh on Startup documentation.

  • Graceful Shutdown for HTTP Server: The HTTP server now drains requests for graceful shutdowns, ensuring smoother runtime termination.

New Contributors 🎉

Contributors

  • @sgrebnov
  • @phillipleblanc
  • @peasee
  • @Jeadie
  • @lukekim
  • @benrussell
  • @Sevenannn
  • @sergey-shandar
  • @Garamda
  • @johnnynunez

Breaking Changes

No breaking changes.

Cookbook Updates

The Spice Cookbook now has 74 recipes that make it easy to get started with Spice!

Upgrading

To upgrade to v1.1.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.1.0 image:

docker pull spiceai/spiceai:1.1.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

  • No major dependency changes.

Changelog

- release: Bump chart, and versions for next release by @peasee in <https://github.com/spiceai/spiceai/pull/4464>
- feat: Schedule testoperator by @peasee in <https://github.com/spiceai/spiceai/pull/4503>
- fix: Remove on zero results arguments from benchmarks by @peasee in <https://github.com/spiceai/spiceai/pull/4533>
- fix: Don't snapshot clickbench benchmarks by @peasee in <https://github.com/spiceai/spiceai/pull/4534>
- docs: v1.0.1 release note by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4529>
- Update acknowledgements by @github-actions in <https://github.com/spiceai/spiceai/pull/4535>
- In spiced_docker, propagate setup to publish-cuda by @Jeadie in <https://github.com/spiceai/spiceai/pull/4543>
- Upgrade Rust to 1.84 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4541>
- Upgrade dependencies by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4546>
- Revert "Use OpenAI golang client in `spice chat` (#4491)" by @Jeadie in <https://github.com/spiceai/spiceai/pull/4564>
- feat: add schema inference for the Spice.ai Data Connector by @peasee in <https://github.com/spiceai/spiceai/pull/4579>
- Remove 'tools: builtin' by @Jeadie in <https://github.com/spiceai/spiceai/pull/4607>
- feat: Add initial IMAP connector by @peasee in <https://github.com/spiceai/spiceai/pull/4587>
- feat: Add email content loading by @peasee in <https://github.com/spiceai/spiceai/pull/4616>
- feat: Add SSL and Auth parameters for IMAP by @peasee in <https://github.com/spiceai/spiceai/pull/4613>
- Change /v1/models to be OpenAI compatible by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4624>
- Use `pdf-extract` crate to extract text from PDF documents by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4615>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4628>
- Add 1.0.2 release notes by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4627>
- Fix cuda::ffi by @Jeadie in <https://github.com/spiceai/spiceai/pull/4649>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4654>
- fix: Spice.ai schema inference by @peasee in <https://github.com/spiceai/spiceai/pull/4674>
- Add SQL Benchmark with sample eval configuration based on TPCH by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4549>
- Update Helm chart to Spice v1.0.2 by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4655>
- Update v1.0.2 release notes by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4639>
- Fix E2E AI release install test on self-hosted runners (macos) by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4675>
- Main performance metrics calculation for Text to SQL Benchmark by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4681>
- Add eval datasets / test scripts for model grading criteria by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4663>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4684>
- Add testoperator for `evals` running by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4688>
- Add GH Workflow to run Text to SQL benchmark by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4689>
- Add 1.0.2 as supported version to SECURITY.md by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4695>
- Text-To-SQL benchmark: trace failed tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4705>
- Text-To-SQL benchmark: extend list of benchmarking models by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4707>
- Text-To-SQL: increase sql coverage, add more advanced tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4713>
- Use model that supports tools in hf_test by @Jeadie in <https://github.com/spiceai/spiceai/pull/4712>
- Fix Spice.ai E2E test by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4723>
- Return non-existing model for v1/chat endpoint by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4718>
- Update Helm chart for 1.0.3 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4742>
- Update dependencies by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4740>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4744>
- Update SECURITY.md with 1.0.3 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4745>
- Add basic smoke test of perplexity LLM to llm integration tests. by @Jeadie in <https://github.com/spiceai/spiceai/pull/4735>
- Don't run integration tests on PRs when only CLI is changed by @Jeadie in <https://github.com/spiceai/spiceai/pull/4751>
- Prompt user to upgrade through brew / do another clean install when spice is installed through homebrew / at non-standard path by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4746>
- feat: Search with keyword filtering by @peasee in <https://github.com/spiceai/spiceai/pull/4759>
- Fix search benchmark by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4765>
- feat: Add IMAP access token parameter by @peasee in <https://github.com/spiceai/spiceai/pull/4769>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4774>
- Mark trunk builds as unstable by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4776>
- feat: Release Spice.ai RC by @peasee in <https://github.com/spiceai/spiceai/pull/4753>
- fix: Validate columns and keywords in search by @peasee in <https://github.com/spiceai/spiceai/pull/4775>
- Run models E2E tests on PR by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4798>
- fix: models runtime not required for cloud chat by @peasee in <https://github.com/spiceai/spiceai/pull/4781>
- Only open one PR for openapi.json by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4807>
- docs: Release IMAP Alpha by @peasee in <https://github.com/spiceai/spiceai/pull/4797>
- Add Results-Cache-Status to indicate query result came from cache by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4809>
- Initial spice cli e2e tests with spice upgrade tests by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4764>
- Log CLI and Runtime Versions on startup by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4816>
- Sort keys for openai by @Jeadie in <https://github.com/spiceai/spiceai/pull/4766>
- Remove docs index trigger from the endgame template by @ewgenius in <https://github.com/spiceai/spiceai/pull/4832>
- Release notes for v1.0.4 by @Jeadie in <https://github.com/spiceai/spiceai/pull/4827>
- Update SECURITY.md by @Jeadie in <https://github.com/spiceai/spiceai/pull/4829>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4831>
- Don't print URL by @lukekim in <https://github.com/spiceai/spiceai/pull/4838>
- add 'eval_run' to 'spice trace' by @Jeadie in <https://github.com/spiceai/spiceai/pull/4841>
- Run benchmark tests w/o uploading test results (pending improvements) by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4843>
- Fix 'actual" and "output" columns in `eval.results`. by @Jeadie in <https://github.com/spiceai/spiceai/pull/4835>
- Fix string escaping of system prompt by @Jeadie in <https://github.com/spiceai/spiceai/pull/4844>
- update helm chart to v1.0.4 by @Jeadie in <https://github.com/spiceai/spiceai/pull/4828>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4806>
- fix: Skip sccache in PR for external users by @peasee in <https://github.com/spiceai/spiceai/pull/4851>
- fix: Return BAD_REQUEST when not embeddings are configured by @peasee in <https://github.com/spiceai/spiceai/pull/4804>
- Debug log cuda detection failure in spice by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4852>
- fix: Set RUSTC wrapper explicitly by @peasee in <https://github.com/spiceai/spiceai/pull/4854>
- Improve trace UX for `ai_completion`, fix infinite tool calls by @Jeadie in <https://github.com/spiceai/spiceai/pull/4853>
- Allow homebrew spice cli to upgrade the runtime by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4811>
- Add support for MCP tools by @Jeadie in <https://github.com/spiceai/spiceai/pull/4808>
- fix: Rustc wrapper actions by @peasee in <https://github.com/spiceai/spiceai/pull/4867>
- Provide link to supported OS list when user platform is not supported by @Garamda in <https://github.com/spiceai/spiceai/pull/4840>
- Always download spice runtime version matched with spice cli version by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4761>
- Disable flaky integration test by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4871>
- fix: sccache actions setup by @peasee in <https://github.com/spiceai/spiceai/pull/4873>
- Fixing Go installation in the setup script for Linux Arm64 by @sergey-shandar in <https://github.com/spiceai/spiceai/pull/4868>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4864>
- DuckDB acceleration: Use temp table only for append with conflict resolution by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4874>
- Trace the output of streamed `chat/completions` to runtime.task_history. by @Jeadie in <https://github.com/spiceai/spiceai/pull/4845>
- Always pass `X-API-Key` in spice api calls header if detected in env by @ewgenius in <https://github.com/spiceai/spiceai/pull/4878>
- Revert "DuckDB acceleration: Use temp table only for append with conflict resolution" by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4886>
- Allow overriding spicerack base url in the CLI by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4892>
- Add test Spicepod for DuckDB full acceleration with constraints by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4891>
- Refactor Parameter Handling by @Advayp in <https://github.com/spiceai/spiceai/pull/4833>
- Add test Spicepod for DuckDB append acceleration with constraints by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4898>
- Update to latest async-openai fork. Update secrecy by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4911>
- Fix mcp tools build by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4916>
- Add more test spicepods by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4923>
- task: Add more dispatch files by @peasee in <https://github.com/spiceai/spiceai/pull/4933>
- run spiceai benchmark test using test operator by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4920>
- Convert sequential search code block to parallel async by @Garamda in <https://github.com/spiceai/spiceai/pull/4936>
- fix: Throughput metric calculation by @peasee in <https://github.com/spiceai/spiceai/pull/4938>
- Update dependabot dependencies & `cargo update` by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4872>
- Improve servers shutdown sequence during runtime termination by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4942>
- Semantic model for views. Views visible in `table_schema` & `list_datasets` tools. by @Jeadie in <https://github.com/spiceai/spiceai/pull/4946>
- update openai-async by @Jeadie in <https://github.com/spiceai/spiceai/pull/4948>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4961>
- fix: Redundant results snapshotting by @peasee in <https://github.com/spiceai/spiceai/pull/4956>
- Create schema for views if not exist by @Jeadie in <https://github.com/spiceai/spiceai/pull/4957>
- Bump Jimver/cuda-toolkit from 0.2.21 to 0.2.22 by @dependabot in <https://github.com/spiceai/spiceai/pull/4969>
- List available operations in `spice trace <operation>` by @Jeadie in <https://github.com/spiceai/spiceai/pull/4953>
- Initial commit of release analytics by @lukekim in <https://github.com/spiceai/spiceai/pull/4975>
- Remove spaces from CSV by @lukekim in <https://github.com/spiceai/spiceai/pull/4977>
- Fix Spice pods watcher by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4984>
- feat: Add appendable data sources for the testoperator by @peasee in <https://github.com/spiceai/spiceai/pull/4949>
- Omit timestamp when warning regarding datasets with hyphens by @Advayp in <https://github.com/spiceai/spiceai/pull/4987>
- Update helm chart to v1.0.5 by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4990>
- docs: Update qa_analytics.csv by @peasee in <https://github.com/spiceai/spiceai/pull/4989>
- Update end_game template by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4991>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4993>
- Add v1.0.5 release notes by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4994>
- Supported Versions: include v1.0.5 by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4995>
- Dependabot updates by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4992>
- Switch to basic markdown formatting for vector search by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4934>
- docs: Update qa_analytics.csv by @peasee in <https://github.com/spiceai/spiceai/pull/5001>
- feat: Add TPCDS FileAppendableSource for testoperator by @peasee in <https://github.com/spiceai/spiceai/pull/5002>
- Update `ring` by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/5003>
- docs: Update qa_analytics.csv by @peasee in <https://github.com/spiceai/spiceai/pull/5006>
- feat: Add ClickBench FileAppendableSource for testoperator by @peasee in <https://github.com/spiceai/spiceai/pull/5004>
- feat: Validate append test table counts by @peasee in <https://github.com/spiceai/spiceai/pull/5008>
- feat: Add append spicepods by @peasee in <https://github.com/spiceai/spiceai/pull/5009>
- Improve Vector Search performance for large content w/o primary key defined by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5010>
- Don't try to downgrade Arc in test_acceleration_duckdb_single_instance by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/5014>
- feat: Add an initial testoperator vector search command by @peasee in <https://github.com/spiceai/spiceai/pull/5011>
- feat: Update testoperator workflows for automatic snapshot updates by @peasee in <https://github.com/spiceai/spiceai/pull/5018>
- Fix Vector Search when additional columns include embedding column by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5022>
- Include test for primary key passed as additional column in Vector Search by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5024>
- fix: Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/5020>
- upgrade mistral.rs by @Jeadie in <https://github.com/spiceai/spiceai/pull/4952>
- fix: Indexes for TPCDS SQLite Spicepod by @peasee in <https://github.com/spiceai/spiceai/pull/5038>
- fix: Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/5035>
- Include local files in generated Spicepod package by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5041>
- update mistral.rs to 'spiceai' branch rev by @Jeadie in <https://github.com/spiceai/spiceai/pull/5029>
- Configure spiced as an MCP SSE server by @Jeadie in <https://github.com/spiceai/spiceai/pull/5039>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/5052>
- fix: Disable benchmarks schedule, enable testoperator schedule by @peasee in <https://github.com/spiceai/spiceai/pull/5058>
- fix: Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/5060>
- Update ROADMAP.md March 2025 by @lukekim in <https://github.com/spiceai/spiceai/pull/5061>
- fix: Testoperator data setup by @peasee in <https://github.com/spiceai/spiceai/pull/5068>
- fix: All HTTP endpoints to hang when adding an invalid dataset with --pods-watcher-enabled by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5050>
- fix: Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/5073>
- Integration tests for MCP tooling by @Jeadie in <https://github.com/spiceai/spiceai/pull/5053>
- OpenAPI docs for MCP by @Jeadie in <https://github.com/spiceai/spiceai/pull/5057>
- fix: Acceleration federation test by @peasee in <https://github.com/spiceai/spiceai/pull/5090>
- fix: Allow spiced commit in testoperator dispatch by @peasee in <https://github.com/spiceai/spiceai/pull/5098>
- fix: Use RefreshOverrides for the refresh API definition by @peasee in <https://github.com/spiceai/spiceai/pull/5095>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/5094>
- fix: Increase tries for refresh_status_change_to_ready test by @peasee in <https://github.com/spiceai/spiceai/pull/5099>
- feat: Testoperator reports on max and median memory usage by @peasee in <https://github.com/spiceai/spiceai/pull/5101>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/5105>
- fix: Fail testoperator on failed queries by @peasee in <https://github.com/spiceai/spiceai/pull/5106>
- Update Helm chart to 1.0.6 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/5107>
- Update SECURITY.md to include 1.0.6 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/5109>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/5108>
- Add QA analytics for 1.0.6 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/5110>
- add env variables to tools, usable in MCP stdio by @Jeadie in <https://github.com/spiceai/spiceai/pull/5097>
- HF downloads obey SIGTERM by @Jeadie in <https://github.com/spiceai/spiceai/pull/5044>
- Add v1.0.6 release notes into trunk by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5111>
- Remove redundant mod name for iceberg integration tests by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5112>
- Use fixed data directory for test operator by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5103>
- Improvements for evals by @Jeadie in <https://github.com/spiceai/spiceai/pull/5040>
- Make McpProxy trait for MCP passthrough by @Jeadie in <https://github.com/spiceai/spiceai/pull/5115>
- Properly handle '/' for tool names. by @Jeadie in <https://github.com/spiceai/spiceai/pull/5116>
- Use retry logic when loading tools by @Jeadie in <https://github.com/spiceai/spiceai/pull/5120>
- Exclude slow tests from regular pr runs by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5119>
- Fix test operator snapshot update by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5130>
- spice init: Fixes windows bug where full path is used for spicepod name by @benrussell in <https://github.com/spiceai/spiceai/pull/5126>
- fix: Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/5131>
- Implement graceful shutdown for HTTP server by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5102>
- Update enhancement.md by @lukekim in <https://github.com/spiceai/spiceai/pull/5142>
- Add GitHub Workflow and PoC Spicepod configuration to run FinanceBench tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5145>
- Fix Postgres and MySQL installation on macos14-runner (E2E CI) by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5155>
- De-duplicate attachments in DuckDBAttachments by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/5156>
- v1.0.7 release note by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5153>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/5160>
- Update Helm chart to 1.0.7 by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5159>
- Add github token to macos test release download tasks by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5161>
- update security.md for 1.0.7 by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5162>
- Update roadmap.md by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5163>
- Add a performance comparison section for 1.0.7 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/5164>
- docs: Add snafu error variant point to style guide by @peasee in <https://github.com/spiceai/spiceai/pull/5167>
- Fix 1.0.7 release note by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5168>
- Adjust DuckDB connection pool size based on DuckDB accelerator instances usage by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5117>
- Add automatic retry for NSQL queries by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5169>
- Include chat completion id to task history by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5170>
- Trace when all runtime components are ready by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5171>
- Update qa_analytics.csv for 1.0.7 by @Sevenannn in <https://github.com/spiceai/spiceai/pull/5165>
- Set default tool recursion limit to 10 to prevent infinite loops by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5173>
- Add support for `schema_source_path` param for object-store data connectors by @sgrebnov in <https://github.com/spiceai/spiceai/pull/5178>
- Run license check and check changes on self-hosted macOS runners by @lukekim in <https://github.com/spiceai/spiceai/pull/5179>
- Add MCP by @lukekim in <https://github.com/spiceai/spiceai/pull/5183>

Full Changelog: github.com/spiceai/spiceai/compare/v1.0.0...release/1.1

Spice v1.0.7 (Mar 26, 2025)

· 4 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.0.7 🏎️

Spice v1.0.7 improves memory usage when using DuckDB, improves schema inference performance when using object-store based data connectors, and fixes a bug in Dremio schema inference.

Highlights in v1.0.7

  • DuckDB Memory Usage: Memory usage when using DuckDB has been significantly improved for data loads and refreshes through expanded use of zero-copy Arrow and multi-threading for data loads. When a duckdb_memory_limit is specified, disk spilling has been improved for greater-than-memory workloads. In addition, a new temp_directory runtime parameter supports storing temporary files to alternative location than the DuckDB data file for higher throughput. For example, temp_directory could be set to a different high-IOPs IO2 EBS volume that is separate from the duckdb_file_path.

    Automated end-to-end tests for the DuckDB Accelerator coverage has been significantly expanded.

    For configuration details, see the documentation for runtime parameters and the DuckDB Data Accelerator.

  • Schema Inference Performance for Object-Store Data Connectors: Schema inference performance has been improved, especially for large numbers of objects (1M+ objects) when using object-store based data connectors by making the object-listing and selection more efficient.

Performance

When compared to previous versions, Spice v1.0.7 loads DuckDB accelerated datasets significantly faster. When using the TPCH lineitem dataset at Scale Factor 100 (600M rows):

Without Indexes

5x faster, 28% less memory usage.

v1.0.6 v1.0.7

VersionLoad TimePeak Memory Usage
v1.0.616m 3s32GB
v1.0.73m 149ms24.4GB

With Indexes

2.5x faster. Higher memory usage in v1.0.7 is due to better resource utilization to achieve faster load times. Use the duckdb_memory_limit parameter to control memory usage.

VersionLoad TimePeak Memory Usage
v1.0.627m 9s50GB
v1.0.711m 30s77GB

v1.0.6 with indexes v1.0.7 with indexes

Documentation

  • DuckDB Data Accelerator: Has been expanded with additional resource usage guidance.
  • Memory: A new section for memory considerations has been added to the Reference section.

Contributors

  • @phillipleblanc
  • @sgrebnov
  • @peasee
  • @Sevenannn

Breaking Changes

No breaking changes.

Upgrading

To upgrade to v1.0.7, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.0.7 image:

docker pull spiceai/spiceai:1.0.7

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

Changelog

- fix: Remove on zero results arguments from benchmarks by @peasee in https://github.com/spiceai/spiceai/pull/4533
- Run benchmark tests w/o uploading test results (pending improvements) by @sgrebnov in https://github.com/spiceai/spiceai/pull/4843
- fix: Return BAD_REQUEST when not embeddings are configured by @peasee in https://github.com/spiceai/spiceai/pull/4804
- Fix Dremio schema inference by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5114
- Improve performance of schema inference for object-store data connectors by @sgrebnov in https://github.com/spiceai/spiceai/pull/5124
- Always download spice runtime version matched with spice cli version by @Sevenannn in https://github.com/spiceai/spiceai/pull/4761
- Fix go lint errors by @sgrebnov in https://github.com/spiceai/spiceai/pull/5147
- Make DuckDB acceleration E2E tests more comprehensive by @sgrebnov in https://github.com/spiceai/spiceai/pull/5146
- Enable Spice to load larger than memory datasets into DuckDB accelerations by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5149
- Add `temp_directory` runtime parameter and insert it for DuckDB accelerations by @phillipleblanc in https://github.com/spiceai/spiceai/pull/5152
- Fix Postgres and MySQL installation on macos14-runner (E2E CI) by @sgrebnov in https://github.com/spiceai/spiceai/pull/5155
- Enable E2E for DuckDB full mode acceleration with indexes only in CI by @sgrebnov in https://github.com/spiceai/spiceai/pull/5154

Full Changelog: github.com/spiceai/spiceai/compare/v1.0.6...v1.0.7