Spice v1.5.0 (July 21, 2025)

July 22, 2025 · 14 min read

Senior Software Engineer at Spice AI

Announcing the release of Spice v1.5.0! 🔍

Spice v1.5.0 brings major upgrades to search and retrieval. It introduces native support for Amazon S3 Vectors, enabling petabyte scale vector search directly from S3 vector buckets, alongside SQL-integrated vector and tantivy-powered full-text search, partitioning for DuckDB acceleration, and automated refreshes for search indexes and views. It includes the AWS Bedrock Embeddings Model Provider, the Oracle Database connector, and the now-stable Spice.ai Cloud Data Connector, and the upgrade to DuckDB v1.3.2.

What's New in v1.5.0

Amazon S3 Vectors Support: Spice.ai now integrates with Amazon S3 Vectors, launched in public preview on July 15, 2025, enabling vector-native object storage with built-in indexing and querying. This integration supports semantic search, recommendation systems, and retrieval-augmented generation (RAG) at petabyte scale with S3’s durability and elasticity. Spice.ai manages the vector lifecycle—ingesting data, creating embeddings with models like Amazon Titan or Cohere via AWS Bedrock, or others available on HuggingFace, and storing it in S3 Vector buckets.

Spice integration with Amazon S3 Vectors

Example Spicepod.yml configuration for S3 Vectors:

datasets:
  - from: s3://my_data_bucket/data/
    name: my_vectors
    params:
      file_format: parquet
    acceleration:
      enabled: true
    vectors:
      engine: s3_vectors
      params:
        s3_vectors_aws_region: us-east-2
        s3_vectors_bucket: my-s3-vectors-bucket
    columns:
      - name: content
        embeddings:
          - from: bedrock_titan
            row_id:
              - id

Example SQL query using S3 Vectors:

SELECT *
FROM vector_search(my_vectors, 'Cricket bats', 10)
WHERE price < 100
ORDER BY score

For more details, refer to the S3 Vectors Documentation.

SQL-integrated Search: Vector and BM25-scored full-text search capabilities are now natively available in SQL queries, extending the power of the POST v1/search endpoint to all SQL workflows.

Example Vector-Similarity-Search (VSS) using the vector_search UDTF on the table reviews for the search term "Cricket bats":

SELECT review_id, review_text, review_date, score
FROM vector_search(reviews, "Cricket bats")
WHERE country_code="AUS"
LIMIT 3

Example Full-Text-Search (FTS) using the text_search UDTF on the table reviews for the search term "Cricket bats":

SELECT review_id, review_text, review_date, score
FROM text_search(reviews, "Cricket bats")
LIMIT 3

DuckDB v1.3.2 Upgrade: Upgraded DuckDB engine from v1.1.3 to v1.3.2. Key improvements include support for adding primary keys to existing tables, resolution of over-eager unique constraint checking for smoother inserts, and 13% reduced runtime on TPC-H SF100 queries through extensive optimizer refinements. The v1.2.x release of DuckDB was skipped due to a regression in indexes.

Read the DuckDB v1.2.0 announcement.
Read the DuckDB v1.3.0 announcement.

Partitioned Acceleration: DuckDB file-based accelerations now support partition_by expressions, enabling queries to scale to large datasets through automatic data partitioning and query predicate pruning. New UDFs, bucket and truncate, simplify partition logic.

New UDFs useful for partition_by expressions:

bucket(num_buckets, col): Partitions a column into a specified number of buckets based on a hash of the column value.
truncate(width, col): Truncates a column to a specified width, aligning values to the nearest lower multiple (e.g., truncate(10, 101) = 100).

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/some_large_table/
    name: my_table
    params:
      file_format: parquet
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      partition_by: bucket(100, account_id) # Partition account_id into 100 buckets

Full-Text-Search (FTS) Index Refresh: Accelerated datasets with search indexes maintain up-to-date results with configurable refresh intervals.

Example refreshing search indexes on body every 10 seconds:

datasets:
  - from: github:github.com/spiceai/docs/pulls
    name: spiceai.doc.pulls
    params:
      github_token: ${secrets:GITHUB_TOKEN}
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_check_interval: 10s
    columns:
      - name: body
        full_text_search:
          enabled: true
          row_id:
            - id

Scheduled View Refresh: Accelerated Views now support cron-based refresh schedules using refresh_cron, automating updates for accelerated data.

Example Spicepod.yml configuration:

views:
  - name: my_view
    sql: SELECT 1
    acceleration:
      enabled: true
      refresh_cron: '0 * * * *' # Every hour

For more details, refer to Scheduled Refreshes.

Multi-column Vector Search: For datasets configured with embeddings on more than one column, POST v1/search and similarity_search perform parallel vector search on each column, aggregating results using reciprocal rank fusion.

Example Spicepod.yml for multi-column search:

datasets:
  - from: github:github.com/apache/datafusion/issues
    name: datafusion.issues
    params:
      github_token: ${secrets:GITHUB_TOKEN}
    columns:
      - name: title
        embeddings:
          - from: hf_minilm
      - name: body
        embeddings:
          - from: openai_embeddings

AWS Bedrock Embeddings Model Provider: Added support for AWS Bedrock embedding models, including Amazon Titan Text Embeddings and Cohere Text Embeddings.

Example Spicepod.yml:

embeddings:
  - from: bedrock:cohere.embed-english-v3
    name: cohere-embeddings
    params:
      aws_region: us-east-1
      input_type: search_document
      truncate: END
  - from: bedrock:amazon.titan-embed-text-v2:0
    name: titan-embeddings
    params:
      aws_region: us-east-1
      dimensions: '256'

For more details, refer to the AWS Bedrock Embedding Models Documentation.

Oracle Data Connector: Use from: oracle: to access and accelerate data stored in Oracle databases, deployed on-premises or in the cloud.

Example Spicepod.yml:

datasets:
  - from: oracle:"SH"."PRODUCTS"
    name: products
    params:
      oracle_host: 127.0.0.1
      oracle_username: scott
      oracle_password: tiger

See the Oracle Data Connector documentation.

GitHub Data Connector: The GitHub data connector supports query and acceleration of members, the users of an organization.

Example Spicepod.yml configuration:

datasets:
  - from: github:github.com/spiceai/members # General format: github.com/[org-name]/members
    name: spiceai.members
    params:
      # With GitHub Apps (recommended)
      github_client_id: ${secrets:GITHUB_SPICEHQ_CLIENT_ID}
      github_private_key: ${secrets:GITHUB_SPICEHQ_PRIVATE_KEY}
      github_installation_id: ${secrets:GITHUB_SPICEHQ_INSTALLATION_ID}
      # With GitHub Tokens
      # github_token: ${secrets:GITHUB_TOKEN}

See the [GitHub Data Connector Documentation]

Spice.ai Cloud Data Connector: Graduated to Stable.

spice-rs SDK Release: The Spice Rust SDK has updated to v3.0.0. This release includes optimizations for the Spice client API, adds robust query retries, and custom metadata configurations for spice queries.

Contributors

Breaking Changes

Search HTTP API Response: POST v1/search response payload has changed. See the new API documentation for details.
Model Provider Parameter Prefixes: Model Provider parameters use provider-specific prefixes instead of openai_ prefixes (e.g., hf_temperature for HuggingFace, anthropic_max_completion_tokens for Anthropic, perplexity_tool_choice for Perplexity). The openai_ prefix remains supported for backward compatibility but is deprecated and will be removed in a future release.

Cookbook Updates

Added Oracle Data Connector cookbook: Connect to tables in Oracle databases.
Added Hashed Partitioning with DuckDB cookbook: Accelerate data on large datasets by partitioning data into a fixed number of buckets.

The Spice Cookbook now includes 72 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.5.0, download and install the specific binary from github.com/spiceai/spiceai/releases/tag/v1.5.0 or pull the v1.5.0 Docker image (spiceai/spiceai:1.5.0).

What's Changed

Dependencies

delta_kernel: Upgraded to v0.12.1
DuckDB: Upgraded from v1.1.3 to v1.3.2
iceberg-rust: Upgraded from v0.4.0 to v0.5.1

Changelog

fix: openai model endpoint (#6394) by @Sevenannn in #6394
Enable configuring otel endpoint from spice run (#6360) by @Advayp in #6360
Enable Oracle connector in default build configuration (#6395) by @sgrebnov in #6395
fix llm integraion test (#6398) by @Sevenannn in #6398
Promote spice cloud connector to stable quality (#6221) by @Sevenannn in #6221
v1.5.0-rc.1 release notes (#6397) by @lukekim in #6397
Fix model nsql integration tests (#6365) by @Sevenannn in #6365
Fix incorrect UDTF name and SQL query (#6404) by @lukekim in #6404
Update v1.5.0-rc.1.md (#6407) by @sgrebnov in #6407
Improve error messages (#6405) by @lukekim in #6405
build(deps): bump Jimver/cuda-toolkit from 0.2.25 to 0.2.26 (#6388) by @app/dependabot in #6388
Upgrade dependabot dependencies (#6411) by @phillipleblanc in #6411
Fix projection pushdown issues for document based file connector (#6362) by @Advayp in #6362
Add a PartitionedDuckDB Accelerator (#6338) by @kczimm in #6338
Use vector_search() UDTF in HTTP APIs (#6417) by @Jeadie in #6417
add supported types (#6409) by @kczimm in #6409
Enable session time zone override for MySQL (#6426) by @sgrebnov in #6426
Acceleration-like indexing for full text search indexes. (#6382) by @Jeadie in #6382
Provide error message when partition by expression changes (#6415) by @kczimm in #6415
Add support for Oracle Autonomous Database connections (Oracle Cloud) (#6421) by @sgrebnov in #6421
prune partitions for exact and in list with and without UDFs (#6423) by @kczimm in #6423
Fixes and reenable FTS tests (#6431) by @Jeadie in #6431
Upgrade DuckDB to 1.3.2 (#6434) by @phillipleblanc in #6434
Fix issue in limit clause for the Github Data connector (#6443) by @Advayp in #6443
Upgrade iceberg-rust to 0.5.1 (#6446) by @phillipleblanc in #6446
v1.5.0-rc.2 release notes (#6440) by @lukekim in #6440
Oracle: add automated TPC-H SF1 benchmark tests (#6449) by @sgrebnov in #6449
fix: Update benchmark snapshots (#6455) by @app/github-actions in #6455
Preserve ArrowError in arrow_tools::record_batch (#6454) by @mach-kernel in #6454
fix: Update benchmark snapshots (#6465) by @app/github-actions in #6465
Add option to preinstall Oracle ODPI-C library in Docker image (#6466) by @sgrebnov in #6466
Include Oracle connector (federated mode) in automated benchmarks (#6467) by @sgrebnov in #6467
Update crates/llms/src/bedrock/embed/mod.rs by @lukekim in #6468
v1.5.0-rc.3 release notes (#6474) by @lukekim in #6474
Add integration tests for S3 Vectors filters pushdown (#6469) by @sgrebnov in #6469
check for indexedtableprovider when finding tables to search on (#6478) by @Jeadie in #6478
Parse fully qualified table names in UDTFs (#6461) by @Jeadie in #6461
Add integration test for S3 Vectors to cover data update (overwrite) (#6480) by @sgrebnov in #6480
Add 'Run all tests' option for models tests and enable Bedrock tests (#6481) by @sgrebnov in #6481
Add support for a members table type for the GitHub Data Connector (#6464) by @Advayp in #6464
S3 vector data cannot be null (#6483) by @Jeadie in #6483
Don't infer FixedSizeList size during indexing vectors. (#6487) by @Jeadie in #6487
Add support for retention_sql acceleration param (#6488) by @sgrebnov in #6488
Make dataset refresh progress tracing less verbose (#6489) by @sgrebnov in #6489
Use RwLock on tantivy index in FullTextDatabaseIndex for update concurrency (#6490) by @Jeadie in #6490
Add tests for dataset retention logic and refactor retention code (#6495) by @sgrebnov in #6495
Upgade dependabot dependencies (#6497) by @phillipleblanc in #6497
Add periodic tracing of data loading progress during dataset refresh (#6499) by @sgrebnov in #6499
Promote Oracle Data Connector to Alpha (#6503) by @sgrebnov in #6503
Use AWS SDK to provide credentials for Iceberg connectors (#6498) by @phillipleblanc in #6498
Add integration tests for partitioning (#6463) by @kczimm in #6463
Use top-level table in full-text search JOIN ON (#6491) by @Jeadie in #6491
Use accelerated table in vector_search JOIN operations when appropriate (#6516) by @Jeadie in #6516
Fix 'additional_column' for quoted columns (fix for qualified columns broke it) (#6512) by @Jeadie in #6512
Also use AWS SDK for inferring credentials for S3/Delta/Databricks Delta data connectors (#6504) by @phillipleblanc in #6504
Add per-dataset availability monitor configuration (#6482) by @phillipleblanc in #6482
Suppress the warning from the AWS SDK if it can't load credentials (#6533) by @phillipleblanc in #6533
Change default value of check_availability from default to auto (#6534) by @lukekim in #6534
README.md improvements for v1.5.0 (#6539) by @lukekim in #6539
Temporary disable s3_vectors_basic (#6537) by @sgrebnov in #6537
Ensure binder errors show before query and other (#6374) by @suhuruli in #6374
Update spiceai/duckdb-rs -> DuckDB 1.3.2 + index fix (#6496) by @mach-kernel in #6496
Update table-providers to latest version with DuckDB fixes (#6535) by @phillipleblanc in #6535
S3: default to public access if no auth is provided (#6532) by @sgrebnov in #6532

What's New in v1.5.0​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

What's New in v1.5.0

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog