Spice v2.0-rc.4 (Apr 30, 2026)

May 1, 2026 · 22 min read

William Croxson

Senior Software Engineer at Spice AI

Announcing the release of Spice v2.0-rc.4! 🚀

v2.0.0-rc.4 is the fourth release candidate for advanced testing of v2.0, building on v2.0.0-rc.3.

Highlights in this release candidate include:

Elasticsearch Data Connector (Alpha) with native hybrid search (BM25 full-text + kNN vector + RRF)
PostgreSQL Native CDC via WAL logical replication, eliminating the need for Debezium or Kafka
Multi-vector Embeddings with MaxSim for ColBERT-style late-interaction retrieval
Rerank UDTF for hybrid search pipelines with automatic query propagation
HashiCorp Vault and Azure Key Vault Secret Stores for enterprise secret management
DuckDB Vector Engine with HNSW index support
Azure Cosmos DB Connector (RC), Git Connector promoted to RC
MCP Streamable HTTP transport
Read-only API Key Enforcement on Flight DoGet and async query paths

What's New in v2.0.0-rc.4

Elasticsearch Data Connector (Alpha, Spice.ai Enterprise)

The new Elasticsearch data connector enables querying Elasticsearch indexes as SQL tables with full hybrid search support. Currently available in Spice.ai Enterprise.

Key capabilities:

SQL Table Access: Query any Elasticsearch index with standard SQL via a native DataFusion TableProvider.
kNN Vector Search: Use the vector_search() UDTF against Elasticsearch-backed vector fields.
BM25 Full-Text Search: Use the text_search() UDTF for native Elasticsearch full-text queries.
Hybrid Search: Combine kNN and BM25 results with the rrf() UDTF for reciprocal rank fusion.
Elasticsearch as a Vector Engine: Accelerated datasets can use Elasticsearch as the backing vector engine for embedding storage and retrieval.

Example configuration:

datasets:
  - from: elasticsearch:my_index
    name: my_data
    params:
      elasticsearch_endpoint: https://my-cluster.es.io:9200
      elasticsearch_username: ${secrets:es_user}
      elasticsearch_password: ${secrets:es_password}

PostgreSQL Native Replication via WAL

Postgres datasets configured with refresh_mode: changes can now stream changes directly from PostgreSQL logical replication (WAL) into any local accelerator without Debezium or Kafka required.

Key capabilities:

Native Logical Replication: Uses pgoutput decoding to stream INSERT/UPDATE/DELETE events.
Automatic Slot Management: Each Spice replica creates a distinct replication slot (spice_<dataset>_<hash>), so multi-replica deployments work automatically. Publications are shared.
Bootstrap Snapshot: An initial REPEATABLE READ snapshot seeds the accelerator before replication begins.
LSN Acknowledgement: The LsnCommitter sends durable LSN back to Postgres so WAL segments are reclaimed.
All Accelerators Supported: Works with DuckDB, SQLite, Postgres, Cayenne, and Arrow accelerators.

Example configuration:

datasets:
  - from: postgres:my_table
    name: my_table
    params:
      pg_host: localhost
      pg_port: 5432
      pg_db: mydb
      pg_publication: my_publication   # optional; auto-created if omitted
    acceleration:
      enabled: true
      engine: duckdb
      refresh_mode: changes

Multi-vector Embeddings with MaxSim (Late Interaction)

Column-level embeddings now support list-of-string columns, producing one embedding vector per list element and enabling ColBERT-style late-interaction retrieval.

Key capabilities:

Multi-vector per Row: Columns of type List<String> produce List<FixedSizeList<F32, D>> — one embedding per list element.
MaxSim / Mean / Sum Scoring: Per-row score is the max, mean, or sum cosine over the list elements. Default is MaxSim (ColBERT).
_match Column: Returns the specific list element that produced the highest cosine similarity.
No Schema Changes Required: Works with existing embedding configurations; activates automatically for list-type columns.

Rerank UDTF for Hybrid Search

A new rerank() table-valued function reorders scored results from vector_search, text_search, or rrf by a reranker model's relevance judgements. See Search Functionality for an overview of search UDTFs.

Key capabilities:

Auto Query Propagation: The query string is automatically inherited from a nested search UDTF — no repetition required.
Any Chat Model as Reranker: Any registered chat/completion model can serve as a reranker via the built-in LlmRerank adapter (listwise prompt by default; pointwise available).
Filter and Projection Pushdown: The RerankExec physical node supports pushdown, reducing data movement.
Extensible: A new RerankerModelStore sits alongside ChatModelStore and EmbeddingModelStore; native providers (Cohere, Voyage, BGE) can be added without runtime plumbing changes.

SELECT * FROM rerank(
    rrf(vector_search('my_table', 'query text'), text_search('my_table', 'query text')),
    document => content
) LIMIT 10;

New Secret Stores: HashiCorp Vault and Azure Key Vault

Two new enterprise-grade Secret Stores are now available.

HashiCorp Vault (hashicorp_vault):

KV v2 (default) and KV v1 mount support.
Auth methods: token, approle, kubernetes, jwt.
Token leases are cached and automatically re-acquired on expiry.

secrets:
  - from: hashicorp_vault:secret/my-app
    name: my_secrets
    params:
      hashicorp_vault_addr: https://vault.example.com
      hashicorp_vault_auth_method: approle
      hashicorp_vault_role_id: ${env:VAULT_ROLE_ID}
      hashicorp_vault_secret_id: ${secrets:vault_secret_id}

Azure Key Vault (azure_keyvault):

Per-key caching with single-flight fetch coalescing.
Auth methods: service principal, managed identity, workload identity, Azure CLI, or auto-detect.
Supports sovereign clouds via endpoint parameter.

secrets:
  - from: azure_keyvault:my-vault
    name: my_secrets
    params:
      azure_keyvault_auth_method: managed_identity

DuckDB Vector Engine

DuckDB-accelerated tables can now use DuckDB's HNSW index for vector search via the vector_engine: duckdb option, enabling fast approximate nearest-neighbor search without an external vector store.

Example configuration:

datasets:
  - from: postgres:public.documents
    name: documents
    columns:
      - name: content
        embeddings:
          - from: hf_minilm
            row_id: id
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
    vectors:
      enabled: true
      engine: duckdb
      params:
        duckdb_distance_metric: cosine
        duckdb_hnsw_m: 16
        duckdb_hnsw_ef_construction: 64
        duckdb_hnsw_ef_search: 32

embeddings:
  - from: huggingface:huggingface.co/minishlab/potion-base-2M
    name: hf_minilm

New and Promoted Connectors

Azure Cosmos DB (Alpha):

A new read-only Azure Cosmos DB NoSQL / Core SQL API connector built on the azure_data_cosmos 0.30 SDK. Supports cross-partition scans, schema inference from document samples, and key-based auth (connection string or account endpoint + key).

Git Connector (RC):

The Git data connector is promoted to RC status with HTTPS/SSH auth (git_token, git_username/git_password, git_ssh_key), Git LFS support (enable_lfs), and per-repo connection resilience (semaphore, bounded retries with exponential backoff, permanent-error circuit breaking).

DynamoDB Write Support (DML)

DynamoDB datasets now support write-back via INSERT, UPDATE, and DELETE operations, complementing the existing read and CDC streaming capabilities.

MCP Streamable HTTP Transport

The MCP server has been upgraded to rmcp 1.5.0 and switched to the Streamable HTTP transport (/v1/mcp), replacing the previous SSE-based endpoint. The client-side transport is updated to StreamableHttpClientTransport.

Security Improvements

Read-only API Key Enforcement: API keys with read-only scope are now strictly enforced on the Flight DoGet path and on async query endpoints, preventing write operations from being issued under a read-only key.

GitHub Workflow Hardening: CI workflows have been hardened with improved security posture to reduce supply-chain risk.

Developer Experience Improvements

Actionable Config Errors: Parameter typos, missing secret references, and unknown engine names now produce specific, actionable error messages with Levenshtein-based suggestions, rather than silent drops or generic "missing required parameter" messages.
spice init Improvements: Written spicepods now include a yaml-language-server: $schema=... directive for IDE completions. Creation messages print regardless of log level.
REPL Improvements: Log filter honors RUST_LOG when -v is not passed; version banner moves to stderr and prints only on an interactive TTY.
403 / 401 Routing: HTTP 403 responses route to a new PermissionDenied variant; 401 messages point at spice login / SPICE_API_KEY.

OpenTelemetry Improvements

See Observability & Monitoring and the runtime.telemetry reference for full configuration details.

Metric Name Prefix: Configure a prefix for all exported OTLP metric names via runtime.telemetry.metric_prefix.
Delta Temporality Default: The OTLP push exporter now defaults to delta temporality, matching Prometheus and most backends.
Resource Attributes: runtime.telemetry.properties are applied as OTLP resource attributes on exported metrics.

Full-text Search Performance

Tantivy full-text search ingestion performance is significantly improved with better batch handling and a rollback-on-error path.

SQL and Query Engine

DataFusion Upgrade: Updated to a newer DataFusion revision with additional bug fixes and performance improvements.
Views on DDL Catalogs: DDL-defined catalogs (e.g., Unity Catalog) can now expose and query views.
flatten_json / json_tree / expand_maps UDTFs: New table-valued functions for JSON transformation, map expansion, and schema decomposition in query pipelines. See JSON Functions and Operators.
cosine_distance Pushdown to DuckDB: cosine_distance is now pushed down to DuckDB accelerators via array_cosine_distance.
Snowflake Type Support: Added support for OBJECT, MAP, GEOGRAPHY, GEOMETRY, VECTOR, and TIMESTAMP_LTZ types in the Snowflake connector.
MySQL Zero-Date Behavior: The MySQL connector adds a new mysql_zero_date_behavior parameter (null or error) controlling how MySQL zero-date values (0000-00-00) are handled.
Databricks Timeouts: The Databricks connector adds new connect_timeout and client_timeout parameters for sql_warehouse mode.

Dependency Updates

Dependency / Component	Version / Update
DataFusion	Updated
rmcp	v1.5.0 (from fork pin)
mistral.rs	Updated
openssl	0.10.78

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No new cookbook recipes.

The Spice Cookbook includes 86 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v2.0.0-rc.4, use one of the following methods:

CLI:

spice upgrade v2.0.0-rc.4

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:2.0.0-rc.4 image:

docker pull spiceai/spiceai:2.0.0-rc.4

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai --version 2.0.0-rc.4

AWS Marketplace:

Spice is available in the AWS Marketplace.

What's Changed

Changelog

Integrate spiceio and makefile_targets into pr.yml by @lukekim in #10357
ci: skip artifact compression for test binaries/archives by @lukekim in #10381
chore(deps): bump spiceai/candle, spiceai/mistral.rs, aws-lc-rs, tantivy, rand by @lukekim in #10379
Bump datafusion-table-providers (#10375) by @lukekim in #10384
fix: Update Search integration test snapshots by @app/github-actions in #10376
v2.0.0-rc.3 preparation by @ewgenius in #10382
fix(spicepod): JSON schema accepts string or {name: expr} for partition_by by @lukekim in #10352
fix: Use ROUND for Turso decimal BETWEEN comparisons (fixes #9872) by @claudespice in #10360
Revert "v2.0.0-rc.3 preparation" from trunk by @ewgenius in #10386
Add on_schema_resolved dataset ready state by @lukekim in #10368
feat: Add Elasticsearch data connector with hybrid search support by @lukekim in #10258
ci: bump test archive upload compression-level to 1 by @lukekim in #10388
feat(git-connector): promote Git connector to RC status by @lukekim in #10385
feat(postgres): stream WAL directly to Spice accelerators by @lukekim in #10364
Add schema decomposition to the HTTP connector by @lukekim in #10393
fix(cayenne): Skip catalog refresh state reload for existing providers by @sgrebnov in #10396
Make cayenne-flightsql tool by @Jeadie in #10356
build(deps): bump the github-actions-dependencies group with 2 updates by @app/dependabot in #10398
Update openapi.json by @app/github-actions in #10272
Merge develop to trunk — 2026-04-19 by @claudespice in #10407
feat(otel): default OTLP push exporter to delta temporality by @phillipleblanc in #10412
fix: Restore analyzer rule ordering to run federation before type coercion by @sgrebnov in #10415
fix: Map Utf8/LargeUtf8 to STRING in Databricks/Spark SQL dialects by @sgrebnov in #10420
feat(otel): add metric name prefix at runtime.telemetry.metric_prefix by @phillipleblanc in #10418
fix: Map LargeUtf8 to VARCHAR in Athena ODBC dialect by @sgrebnov in #10419
feat(cluster): connector-driven object store registration on executors by @phillipleblanc in #10414
build(deps): bump ubuntu from 22.04 to 24.04 in the docker-dependencies group by @app/dependabot in #10397
fix: Update benchmark snapshots Apr 20 by @app/github-actions in #10417
feat(otel): apply runtime.telemetry.properties as resource attributes on exported metrics by @phillipleblanc in #10416
Publish RC releases to DockerHub; upgrade runners to ubuntu-24.04 by @lukekim in #10428
feat: Add Azure Cosmos DB (NoSQL) data connector (RC) by @lukekim in #10392
feat(datafusion): flatten_json_properties + json_tree UDTFs by @lukekim in #10406
Harden /v1/tools and /v1/nsql against unauthenticated / LLM-driven SQL by @lukekim in #10365
feat(embeddings): multi-vector embeddings with MaxSim + late-interaction by @lukekim in #10408
Update GH runners for CUDA builds by @ewgenius in #10432
fix(delta_lake): register object stores on cluster executors by @phillipleblanc in #10436
DF-native DML by @krinart in #10327
ci: run Build and Test on spiceai-macos; split install jobs by profile by @lukekim in #10434
Improve search UDTFs: text_search, vector_search, rrf by @lukekim in #10387
fix(model2vec): Improve robustness of model loading for sentence-transformers layouts by @sgrebnov in #10444
Merge develop to trunk — 2026-04-21 by @claudespice in #10448
Enable filter pushdown for vector_search UDTF by @sgrebnov in #10447
Support Snowflake OBJECT, MAP, GEOGRAPHY, GEOMETRY, VECTOR, TIMESTAMP_LTZ types by @lukekim in #10451
Fix Databricks tests by @krinart in #10449
fix(cluster): forward register_object_stores through connector wrappers by @phillipleblanc in #10460
Fixes for vector-search by @krinart in #10455
Add expand_maps option and flatten_json UDTF by @lukekim in #10452
fix: Update Search integration test snapshots by @app/github-actions in #10458
Fix physical codec decode ambiguity for empty protobuf messages by @sgrebnov in #10466
chore(logging): demote s3_single_file_cached skip refresh log to debug by @phillipleblanc in #10467
Enable filter pushdown for rrf UDTF by @sgrebnov in #10465
feat(cluster): consolidate distributed state into cluster.json by @phillipleblanc in #10463
feat(cayenne): Add column statistics and data inlining by @lukekim in #10314
docs(copilot): flag missing wrapper delegation when adding default trait methods by @phillipleblanc in #10461
Wire Elasticsearch vector engine write path through acceleration by @lukekim in #10453
Add helm lint CI by @ewgenius in #10468
Fix Azure and GCS acceleration snapshot object store credential handling by @phillipleblanc in #10486
Update spicepod.schema.json by @app/github-actions in #10485
fix(secrets): harden AWS Secrets Manager secret store by @lukekim in #10478
Update datafusion-ballista crate by @sgrebnov in #10488
feat(secrets): add ParameterSpec and more params for AWS secrets manager by @phillipleblanc in #10487
Add rerank UDTF for hybrid search with query auto-propagation by @lukekim in #10469
Fix flatten_json_properties by @krinart in #10475
fix: preserve field and schema metadata in expand_views_schema by @claudespice in #10494
Upgrade rmcp to upstream 1.5.0; switch MCP server to Streamable HTTP by @lukekim in #10491
fix: handle Snowflake TIMESTAMP_LTZ wire format and prevent nanosecond overflow by @claudespice in #10493
Lint parity in Makefile by @krinart in #10492
Add connect_timeout/client_timeout params to Databricks sql_warehouse mode by @lukekim in #10495
fix(tracing): suppress opentelemetry INFO logs at all verbosity levels by @lukekim in #10497
DynamoDB DML by @krinart in #10470
feat(cayenne): native vector search via SIMD similarity UDFs by @lukekim in #10456
fix(cli): suppress banner for all JSON-producing cloud subcommands (fixes #10498) by @claudespice in #10510
fix(deps): bump openssl to 0.10.78 by @phillipleblanc in #10509
fix(s3): quiet AWS SDK credential probe when no region is configured by @phillipleblanc in #10506
fix(cdc): emit ready signal on caught-up Kafka/Debezium streams (#5201) by @phillipleblanc in #10504
runtime-cluster crate + Run partition discovery before forwarding refresh to executors by @krinart in #10490
Update lint-rust target to use --keep-going by @Jeadie in #10508
Add TPC-H SF100 s3[parquet]-duckdb[file] benchmark spicepod by @lukekim in #10524
Remove dev-profile install steps from pr.yml by @Jeadie in #10507
fix: add missing NULL check on Timestamp path in append refresh by @claudespice in #10518
fix: return error on Decimal128/256 overflow instead of silently dropping scale by @claudespice in #10519
fix: delegate update and delete_from in IndexedTableProvider and EmbeddingTable by @claudespice in #10520
feat(devx): make config errors, CLI, and REPL lead users to success by @lukekim in #10489
fix(rerank): defer execution to RerankExec, enable filters and projection pushdown by @sgrebnov in #10514
fix(llms): support Gemma models with missing attention_bias config field by @lukekim in #10523
Fix vector_search silently ignoring named limit/column/include_score args by @sgrebnov in #10527
fix: split unsupported filters locally in scan() for UseSource mode by @ewgenius in #10528
feat(secrets): add Azure Key Vault secret store by @lukekim in #10496
Bump mistralrs by @krinart in #10532
Fix benchmark configurations and CI build issues by @sgrebnov in #10535
Fix catalog query overrides for MySQL and MSSQL benchmarks by @sgrebnov in #10543
For Cayenne, preserve matched columns for MERGE ... ON <cols> by @Jeadie in #10340
build(deps): bump the aws-sdk group across 1 directory with 5 updates by @app/dependabot in #10538
docs: update AI agent instructions (git workflow + Rust 1.94) by @lukekim in #10544
fix: Update tpch benchmark snapshots by @app/github-actions in #10529
fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-duckdb[file].yaml by @app/github-actions in #10525
Extract runtime-datafusion from runtime by @krinart in #10545
Use generic DML extension planner for Cayenne by @Jeadie in #10437
fix: Update Search integration test snapshots by @app/github-actions in #10552
Fix security and correctness audit issues by @lukekim in #10526
fix(MySQL): revert MySQL result column reorder to fix federated query failures by @sgrebnov in #10557
Fix protoc installation by @krinart in #10566
fix: Disable Ballista dynamic filters on HashJoinExec by @peasee in #10548
Support views on DDL catalogs by @Jeadie in #10554
Update datafusion by @Jeadie in #10422
Improve full-text search indexing performance by @sgrebnov in #10464
feat(mysql): add mysql_zero_date_behavior parameter (null|error) by @phillipleblanc in #10573
fix(snowflake): declare private_key in connector PARAMETERS (fixes #10517) by @claudespice in #10559
Honour CARGO_TARGET_DIR in Makefiles by @Jeadie in #10569
Enable cosine_distance pushdown to DuckDB accelerator via array_cosine_distance by @sgrebnov in #10564
fix: Update test snapshots by @app/github-actions in #10570
fix: Update tpch benchmark snapshots by @app/github-actions in #10560
feat(snapshots): make snapshots an optional feature by @phillipleblanc in #10574
Enforce read-only API key restrictions on Flight DoGet and async query paths by @Jeadie in #10551
Improved security posture on Github workflows by @Jeadie in #10556
fix: Update datafusion-table-providers to improve SqlTable filter pushdown by @sgrebnov in #10595
feat(secrets): add HashiCorp Vault secret store by @phillipleblanc in #10561
fix: delegate update() in UpsertDedupTableProvider to inner provider by @claudespice in #10593
Add DuckDB vector engine support by @lukekim in #10562
Sharepoint - add object-store listing connector with expanded auth and write support by @lukekim in #10473
fix: Install protoc from source by @peasee in #10597

Full Changelog: https://github.com/spiceai/spiceai/compare/v2.0.0-rc.3...v2.0.0-rc.4

What's New in v2.0.0-rc.4​

Elasticsearch Data Connector (Alpha, Spice.ai Enterprise)​

PostgreSQL Native Replication via WAL​

Multi-vector Embeddings with MaxSim (Late Interaction)​

Rerank UDTF for Hybrid Search​

New Secret Stores: HashiCorp Vault and Azure Key Vault​

DuckDB Vector Engine​

New and Promoted Connectors​

DynamoDB Write Support (DML)​

MCP Streamable HTTP Transport​

Security Improvements​

Developer Experience Improvements​

OpenTelemetry Improvements​

Full-text Search Performance​

SQL and Query Engine​

Dependency Updates​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v2.0.0-rc.4

Elasticsearch Data Connector (Alpha, Spice.ai Enterprise)

PostgreSQL Native Replication via WAL

Multi-vector Embeddings with MaxSim (Late Interaction)

Rerank UDTF for Hybrid Search

New Secret Stores: HashiCorp Vault and Azure Key Vault

DuckDB Vector Engine

New and Promoted Connectors

DynamoDB Write Support (DML)

MCP Streamable HTTP Transport

Security Improvements

Developer Experience Improvements

OpenTelemetry Improvements

Full-text Search Performance

SQL and Query Engine

Dependency Updates

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog