Skip to main content

6 posts tagged with "s3"

Amazon S3 storage service topics and usage

View All Tags

Spice v1.10.3 (Dec 29, 2025)

Β· 2 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.10.3! πŸš€

v1.10.3 is a patch release with improved startup reliability, fixes for Azure BlobFS versioned containers, S3 custom endpoint query resolution, and a fix for the OpenAI Responses API.

What's New in v1.10.3​

Additional Improvements & Bug Fixes​

  • Reliability: Telemetry exporter initialization now runs asynchronously, preventing blocked startup in environments with network restrictions (e.g., Kubernetes with restrictive network policies).
  • Reliability: Fixed an issue where queries on Azure Blob containers with versioning enabled would fail with "Azure does not support suffix range requests" error in distributed query mode.
  • Reliability: Fixed S3 location-based queries against custom S3 endpoints (e.g., MinIO, LocalStack). Queries with location predicates on datasets using s3_endpoint and s3_region parameters now correctly route to the configured endpoint instead of defaulting to AWS S3.
  • Reliability: Fixed "project index out of bounds" errors in the query optimizer when union children have mismatched schemas. The optimizer now validates schema compatibility before applying projection pushdown.
  • Reliability: Fixed an issue where the OpenAI Responses API (/v1/responses) was not working correctly.

Contributors​

Breaking Changes​

No breaking changes.

Cookbook Updates​

No major cookbook updates.

The Spice Cookbook includes 84 recipes to help you get started with Spice quickly and easily.

Upgrading​

To upgrade to v1.10.3, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.10.3 image:

docker pull spiceai/spiceai:1.10.3

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

πŸŽ‰ Spice is now available in the AWS Marketplace!

What's Changed​

Changelog​

Spice v1.5.0 (July 21, 2025)

Β· 14 min read
Evgenii Khramkov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.5.0! πŸ”

Spice v1.5.0 brings major upgrades to search and retrieval. It introduces native support for Amazon S3 Vectors, enabling petabyte scale vector search directly from S3 vector buckets, alongside SQL-integrated vector and tantivy-powered full-text search, partitioning for DuckDB acceleration, and automated refreshes for search indexes and views. It includes the AWS Bedrock Embeddings Model Provider, the Oracle Database connector, and the now-stable Spice.ai Cloud Data Connector, and the upgrade to DuckDB v1.3.2.

What's New in v1.5.0​

Amazon S3 Vectors Support: Spice.ai now integrates with Amazon S3 Vectors, launched in public preview on July 15, 2025, enabling vector-native object storage with built-in indexing and querying. This integration supports semantic search, recommendation systems, and retrieval-augmented generation (RAG) at petabyte scale with S3’s durability and elasticity. Spice.ai manages the vector lifecycleβ€”ingesting data, creating embeddings with models like Amazon Titan or Cohere via AWS Bedrock, or others available on HuggingFace, and storing it in S3 Vector buckets.

Spice integration with Amazon S3 Vectors

Example Spicepod.yml configuration for S3 Vectors:

datasets:
- from: s3://my_data_bucket/data/
name: my_vectors
params:
file_format: parquet
acceleration:
enabled: true
vectors:
engine: s3_vectors
params:
s3_vectors_aws_region: us-east-2
s3_vectors_bucket: my-s3-vectors-bucket
columns:
- name: content
embeddings:
- from: bedrock_titan
row_id:
- id

Example SQL query using S3 Vectors:

SELECT *
FROM vector_search(my_vectors, 'Cricket bats', 10)
WHERE price < 100
ORDER BY score

For more details, refer to the S3 Vectors Documentation.

SQL-integrated Search: Vector and BM25-scored full-text search capabilities are now natively available in SQL queries, extending the power of the POST v1/search endpoint to all SQL workflows.

Example Vector-Similarity-Search (VSS) using the vector_search UDTF on the table reviews for the search term "Cricket bats":

SELECT review_id, review_text, review_date, score
FROM vector_search(reviews, "Cricket bats")
WHERE country_code="AUS"
LIMIT 3

Example Full-Text-Search (FTS) using the text_search UDTF on the table reviews for the search term "Cricket bats":

SELECT review_id, review_text, review_date, score
FROM text_search(reviews, "Cricket bats")
LIMIT 3

DuckDB v1.3.2 Upgrade: Upgraded DuckDB engine from v1.1.3 to v1.3.2. Key improvements include support for adding primary keys to existing tables, resolution of over-eager unique constraint checking for smoother inserts, and 13% reduced runtime on TPC-H SF100 queries through extensive optimizer refinements. The v1.2.x release of DuckDB was skipped due to a regression in indexes.

Partitioned Acceleration: DuckDB file-based accelerations now support partition_by expressions, enabling queries to scale to large datasets through automatic data partitioning and query predicate pruning. New UDFs, bucket and truncate, simplify partition logic.

New UDFs useful for partition_by expressions:

  • bucket(num_buckets, col): Partitions a column into a specified number of buckets based on a hash of the column value.
  • truncate(width, col): Truncates a column to a specified width, aligning values to the nearest lower multiple (e.g., truncate(10, 101) = 100).

Example Spicepod.yml configuration:

datasets:
- from: s3://my_bucket/some_large_table/
name: my_table
params:
file_format: parquet
acceleration:
enabled: true
engine: duckdb
mode: file
partition_by: bucket(100, account_id) # Partition account_id into 100 buckets

Full-Text-Search (FTS) Index Refresh: Accelerated datasets with search indexes maintain up-to-date results with configurable refresh intervals.

Example refreshing search indexes on body every 10 seconds:

datasets:
- from: github:github.com/spiceai/docs/pulls
name: spiceai.doc.pulls
params:
github_token: ${secrets:GITHUB_TOKEN}
acceleration:
enabled: true
refresh_mode: full
refresh_check_interval: 10s
columns:
- name: body
full_text_search:
enabled: true
row_id:
- id

Scheduled View Refresh: Accelerated Views now support cron-based refresh schedules using refresh_cron, automating updates for accelerated data.

Example Spicepod.yml configuration:

views:
- name: my_view
sql: SELECT 1
acceleration:
enabled: true
refresh_cron: '0 * * * *' # Every hour

For more details, refer to Scheduled Refreshes.

Multi-column Vector Search: For datasets configured with embeddings on more than one column, POST v1/search and similarity_search perform parallel vector search on each column, aggregating results using reciprocal rank fusion.

Example Spicepod.yml for multi-column search:

datasets:
- from: github:github.com/apache/datafusion/issues
name: datafusion.issues
params:
github_token: ${secrets:GITHUB_TOKEN}
columns:
- name: title
embeddings:
- from: hf_minilm
- name: body
embeddings:
- from: openai_embeddings

AWS Bedrock Embeddings Model Provider: Added support for AWS Bedrock embedding models, including Amazon Titan Text Embeddings and Cohere Text Embeddings.

Example Spicepod.yml:

embeddings:
- from: bedrock:cohere.embed-english-v3
name: cohere-embeddings
params:
aws_region: us-east-1
input_type: search_document
truncate: END
- from: bedrock:amazon.titan-embed-text-v2:0
name: titan-embeddings
params:
aws_region: us-east-1
dimensions: '256'

For more details, refer to the AWS Bedrock Embedding Models Documentation.

Oracle Data Connector: Use from: oracle: to access and accelerate data stored in Oracle databases, deployed on-premises or in the cloud.

Example Spicepod.yml:

datasets:
- from: oracle:"SH"."PRODUCTS"
name: products
params:
oracle_host: 127.0.0.1
oracle_username: scott
oracle_password: tiger

See the Oracle Data Connector documentation.

GitHub Data Connector: The GitHub data connector supports query and acceleration of members, the users of an organization.

Example Spicepod.yml configuration:

datasets:
- from: github:github.com/spiceai/members # General format: github.com/[org-name]/members
name: spiceai.members
params:
# With GitHub Apps (recommended)
github_client_id: ${secrets:GITHUB_SPICEHQ_CLIENT_ID}
github_private_key: ${secrets:GITHUB_SPICEHQ_PRIVATE_KEY}
github_installation_id: ${secrets:GITHUB_SPICEHQ_INSTALLATION_ID}
# With GitHub Tokens
# github_token: ${secrets:GITHUB_TOKEN}

See the GitHub Data Connector Documentation

Spice.ai Cloud Data Connector: Graduated to Stable.

spice-rs SDK Release: The Spice Rust SDK has updated to v3.0.0. This release includes optimizations for the Spice client API, adds robust query retries, and custom metadata configurations for spice queries.

Contributors​

Breaking Changes​

  • Search HTTP API Response: POST v1/search response payload has changed. See the new API documentation for details.
  • Model Provider Parameter Prefixes: Model Provider parameters use provider-specific prefixes instead of openai_ prefixes (e.g., hf_temperature for HuggingFace, anthropic_max_completion_tokens for Anthropic, perplexity_tool_choice for Perplexity). The openai_ prefix remains supported for backward compatibility but is deprecated and will be removed in a future release.

Cookbook Updates​

The Spice Cookbook now includes 72 recipes to help you get started with Spice quickly and easily.

Upgrading​

To upgrade to v1.5.0, download and install the specific binary from github.com/spiceai/spiceai/releases/tag/v1.5.0 or pull the v1.5.0 Docker image (spiceai/spiceai:1.5.0).

What's Changed​

Dependencies​

Changelog​

  • fix: openai model endpoint (#6394) by @Sevenannn in #6394
  • Enable configuring otel endpoint from spice run (#6360) by @Advayp in #6360
  • Enable Oracle connector in default build configuration (#6395) by @sgrebnov in #6395
  • fix llm integraion test (#6398) by @Sevenannn in #6398
  • Promote spice cloud connector to stable quality (#6221) by @Sevenannn in #6221
  • v1.5.0-rc.1 release notes (#6397) by @lukekim in #6397
  • Fix model nsql integration tests (#6365) by @Sevenannn in #6365
  • Fix incorrect UDTF name and SQL query (#6404) by @lukekim in #6404
  • Update v1.5.0-rc.1.md (#6407) by @sgrebnov in #6407
  • Improve error messages (#6405) by @lukekim in #6405
  • build(deps): bump Jimver/cuda-toolkit from 0.2.25 to 0.2.26 (#6388) by @app/dependabot in #6388
  • Upgrade dependabot dependencies (#6411) by @phillipleblanc in #6411
  • Fix projection pushdown issues for document based file connector (#6362) by @Advayp in #6362
  • Add a PartitionedDuckDB Accelerator (#6338) by @kczimm in #6338
  • Use vector_search() UDTF in HTTP APIs (#6417) by @Jeadie in #6417
  • add supported types (#6409) by @kczimm in #6409
  • Enable session time zone override for MySQL (#6426) by @sgrebnov in #6426
  • Acceleration-like indexing for full text search indexes. (#6382) by @Jeadie in #6382
  • Provide error message when partition by expression changes (#6415) by @kczimm in #6415
  • Add support for Oracle Autonomous Database connections (Oracle Cloud) (#6421) by @sgrebnov in #6421
  • prune partitions for exact and in list with and without UDFs (#6423) by @kczimm in #6423
  • Fixes and reenable FTS tests (#6431) by @Jeadie in #6431
  • Upgrade DuckDB to 1.3.2 (#6434) by @phillipleblanc in #6434
  • Fix issue in limit clause for the Github Data connector (#6443) by @Advayp in #6443
  • Upgrade iceberg-rust to 0.5.1 (#6446) by @phillipleblanc in #6446
  • v1.5.0-rc.2 release notes (#6440) by @lukekim in #6440
  • Oracle: add automated TPC-H SF1 benchmark tests (#6449) by @sgrebnov in #6449
  • fix: Update benchmark snapshots (#6455) by @app/github-actions in #6455
  • Preserve ArrowError in arrow_tools::record_batch (#6454) by @mach-kernel in #6454
  • fix: Update benchmark snapshots (#6465) by @app/github-actions in #6465
  • Add option to preinstall Oracle ODPI-C library in Docker image (#6466) by @sgrebnov in #6466
  • Include Oracle connector (federated mode) in automated benchmarks (#6467) by @sgrebnov in #6467
  • Update crates/llms/src/bedrock/embed/mod.rs by @lukekim in #6468
  • v1.5.0-rc.3 release notes (#6474) by @lukekim in #6474
  • Add integration tests for S3 Vectors filters pushdown (#6469) by @sgrebnov in #6469
  • check for indexedtableprovider when finding tables to search on (#6478) by @Jeadie in #6478
  • Parse fully qualified table names in UDTFs (#6461) by @Jeadie in #6461
  • Add integration test for S3 Vectors to cover data update (overwrite) (#6480) by @sgrebnov in #6480
  • Add 'Run all tests' option for models tests and enable Bedrock tests (#6481) by @sgrebnov in #6481
  • Add support for a members table type for the GitHub Data Connector (#6464) by @Advayp in #6464
  • S3 vector data cannot be null (#6483) by @Jeadie in #6483
  • Don't infer FixedSizeList size during indexing vectors. (#6487) by @Jeadie in #6487
  • Add support for retention_sql acceleration param (#6488) by @sgrebnov in #6488
  • Make dataset refresh progress tracing less verbose (#6489) by @sgrebnov in #6489
  • Use RwLock on tantivy index in FullTextDatabaseIndex for update concurrency (#6490) by @Jeadie in #6490
  • Add tests for dataset retention logic and refactor retention code (#6495) by @sgrebnov in #6495
  • Upgade dependabot dependencies (#6497) by @phillipleblanc in #6497
  • Add periodic tracing of data loading progress during dataset refresh (#6499) by @sgrebnov in #6499
  • Promote Oracle Data Connector to Alpha (#6503) by @sgrebnov in #6503
  • Use AWS SDK to provide credentials for Iceberg connectors (#6498) by @phillipleblanc in #6498
  • Add integration tests for partitioning (#6463) by @kczimm in #6463
  • Use top-level table in full-text search JOIN ON (#6491) by @Jeadie in #6491
  • Use accelerated table in vector_search JOIN operations when appropriate (#6516) by @Jeadie in #6516
  • Fix 'additional_column' for quoted columns (fix for qualified columns broke it) (#6512) by @Jeadie in #6512
  • Also use AWS SDK for inferring credentials for S3/Delta/Databricks Delta data connectors (#6504) by @phillipleblanc in #6504
  • Add per-dataset availability monitor configuration (#6482) by @phillipleblanc in #6482
  • Suppress the warning from the AWS SDK if it can't load credentials (#6533) by @phillipleblanc in #6533
  • Change default value of check_availability from default to auto (#6534) by @lukekim in #6534
  • README.md improvements for v1.5.0 (#6539) by @lukekim in #6539
  • Temporary disable s3_vectors_basic (#6537) by @sgrebnov in #6537
  • Ensure binder errors show before query and other (#6374) by @suhuruli in #6374
  • Update spiceai/duckdb-rs -> DuckDB 1.3.2 + index fix (#6496) by @mach-kernel in #6496
  • Update table-providers to latest version with DuckDB fixes (#6535) by @phillipleblanc in #6535
  • S3: default to public access if no auth is provided (#6532) by @sgrebnov in #6532

Spice v1.0-stable (Jan 20, 2025)

Β· 11 min read
William Croxson
Senior Software Engineer at Spice AI

πŸŽ‰ After 47 releases, Spice.ai OSS has reached production readiness with the 1.0-stable milestone!

The core runtime and features such as query federation, query acceleration, catalog integration, search and AI-inference have all graduated to stable status along with key component graduations across data connectors, data accelerators, catalog connectors, and AI model providers.

Highlights in v1.0-stable​

Breaking Changes​

  • Default Runtime Version: The CLI will install the GPU accelerated AI-capable Runtime by default (if supported), when running spice install or spice run. To force-install the non-GPU version, run spice install ai --cpu.

  • Default OpenAI Model: The default OpenAI model has updated to gpt-4o-mini.

  • Identifier Normalization: Unquoted identifiers such as table names are no longer normalized to lowercase. Identifiers will now retain their exact case as provided.

  • Sandboxed Docker Image: The Runtime Docker Image now runs the spiced process as the nobody user in a minimal chroot sandbox.

  • Insecure S3 and ABFS endpoints: The S3 and ABFS connectors now enforce insecure endpoint checks, preventing HTTP endpoints unless allow_http is explicitly enabled. Refer to the documentation for details.

Dependencies​

No major dependency changes.

Upgrading​

To upgrade to v1.0.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.0.0 image:

docker pull spiceai/spiceai:1.0.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

Contributors​

  • @peasee
  • @ewgenius
  • @Jeadie
  • @Sevenannn
  • @lukekim
  • @phillipleblanc
  • @sgrebnov

What's Changed​

- feat: Update load test criteria, testoperator updates by @peasee in <https://github.com/spiceai/spiceai/pull/4311>
- Update helm for v1.0.0-rc.5 by @ewgenius in <https://github.com/spiceai/spiceai/pull/4313>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4318>
- Bump version to v1.0.0, update SECURITY.md by @ewgenius in <https://github.com/spiceai/spiceai/pull/4314>
- Initial criteria for models, embeddings by @Jeadie in <https://github.com/spiceai/spiceai/pull/4223>
- Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/4321>
- Add dremio param for running load test by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4315>
- Promote Databricks (mode: delta_lake) connector to stable by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4328>
- Handle failed query in load test by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4327>
- feat: Use load test hours for baseline query sets by @peasee in <https://github.com/spiceai/spiceai/pull/4334>
- Fix typo in 1.0.0-rc.5 release notes by @ewgenius in <https://github.com/spiceai/spiceai/pull/4329>
- feat: add testoperator data consistency by @peasee in <https://github.com/spiceai/spiceai/pull/4319>
- docs: Release DuckDB connector stable by @peasee in <https://github.com/spiceai/spiceai/pull/4335>
- Fix DocumentDB -> DynamoDB by @lukekim in <https://github.com/spiceai/spiceai/pull/4339>
- Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/4337>
- fix: Download hits.parquet from MinIO for benchmark by @peasee in <https://github.com/spiceai/spiceai/pull/4338>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4341>
- Remove evil averages by @lukekim in <https://github.com/spiceai/spiceai/pull/4343>
- Don't run builds on non-code changes by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4344>
- Remove streaming requirement from Databricks spark Beta and Spark connector Beta by @ewgenius in <https://github.com/spiceai/spiceai/pull/4345>
- Update s3 tpcds spicepods by @ewgenius in <https://github.com/spiceai/spiceai/pull/4346>
- Explicitly set required scale factor for throughput and load tests by @ewgenius in <https://github.com/spiceai/spiceai/pull/4347>
- Fix s3 tpcds dataset name by @ewgenius in <https://github.com/spiceai/spiceai/pull/4348>
- Promote Iceberg Catalog Connector to Beta by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4350>
- Update s3 clickbench benchmark snapshots by @ewgenius in <https://github.com/spiceai/spiceai/pull/4351>
- fix: DuckDB clickbench on zero results by @peasee in <https://github.com/spiceai/spiceai/pull/4349>
- Add integration test with snapshots for databricks catalog connector by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4353>
- refactor: Remove on zero results from benchmarks, add data consistency workflow by @peasee in <https://github.com/spiceai/spiceai/pull/4354>
- Fix Bug: No field named body_embedding when do vector search with refresh sql containing subset of columns by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4297>
- docs: Update roadmap by @peasee in <https://github.com/spiceai/spiceai/pull/4364>
- feat: Release accelerators stable by @peasee in <https://github.com/spiceai/spiceai/pull/4361>
- Add TPCH/TPCDS test spicepods for MySQL by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4365>
- Catch when an insecure (http) S3 and ABFS data connectors endpoint is used without specifying the `allow_http` parameter by @ewgenius in <https://github.com/spiceai/spiceai/pull/4363>
- Update ROADMAP - Iceberg catalog alpha for v1.0 by @ewgenius in <https://github.com/spiceai/spiceai/pull/4367>
- Promote databricks catalog and databricks (spark_connect) connector to beta by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4369>
- Update Roadmap - Iceberg beta by @ewgenius in <https://github.com/spiceai/spiceai/pull/4373>
- Build CUDA binaries for Linux by @Jeadie in <https://github.com/spiceai/spiceai/pull/4320>
- Promote Nvidia NIM as Alpha by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4380>
- Promote xai to alpha by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4381>
- Update stable criteria for object store based connectors by @ewgenius in <https://github.com/spiceai/spiceai/pull/4383>
- Testoperator: http consistency and overhead tests, fixes and ci by @ewgenius in <https://github.com/spiceai/spiceai/pull/4382>
- Promote S3 Data Connector to Stable by @ewgenius in <https://github.com/spiceai/spiceai/pull/4385>
- Download platform-supported CUDA binary version on Linux by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4356>
- Fix http consistency test workflow, add overhead workflow by @ewgenius in <https://github.com/spiceai/spiceai/pull/4387>
- feat: Add Postgres test spicepods by @peasee in <https://github.com/spiceai/spiceai/pull/4388>
- Fix typos + specific in model criteria; Make explicit alpha/beta tests for LLMS in `crates/llms/tests`. by @Jeadie in <https://github.com/spiceai/spiceai/pull/4377>
- Fix federation bug for correlated subqueries of deeply nested Dremio tables by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4389>
- Fix http overhead workflow by @ewgenius in <https://github.com/spiceai/spiceai/pull/4390>
- Tweak model tests, fix embedding input by @ewgenius in <https://github.com/spiceai/spiceai/pull/4391>
- Promote Dremio to Stable quality by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4392>
- Add beta functionality tests for embedding models. by @Jeadie in <https://github.com/spiceai/spiceai/pull/4352>
- docs: Release postgres connector stable by @peasee in <https://github.com/spiceai/spiceai/pull/4398>
- Increase timeout for model response in E2E tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4399>
- Disable ident normalization (i.e. `SELECT MyColumn from table` works) by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4400>
- Preserve schema metadata by @ewgenius in <https://github.com/spiceai/spiceai/pull/4402>
- Make models integration tests tracing less verbose by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4403>
- Fix `cuda` feature build on Windows by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4404>
- Promote MySQL to Stable by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4406>
- docs: Release Delta Lake and Unity catalog by @peasee in <https://github.com/spiceai/spiceai/pull/4405>
- Use `gpt-4o-mini` as a default model for openai provider by @ewgenius in <https://github.com/spiceai/spiceai/pull/4410>
- Fix streaming for Openai and Anthropic by @Jeadie in <https://github.com/spiceai/spiceai/pull/4409>
- Tweak model loading and missing tool errors messages by @ewgenius in <https://github.com/spiceai/spiceai/pull/4412>
- Spice CLI: fallback to CPU build for unsupported GPU Compute Capability by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4407>
- Build Windows CUDA binaries as part of `build_and_release` workflow by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4386>
- Update docs link by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4416>
- feat: Add CPU models install escape hatch by @peasee in <https://github.com/spiceai/spiceai/pull/4419>
- Handle OpenAI API Errors by @ewgenius in <https://github.com/spiceai/spiceai/pull/4417>
- Update spice cli to use `GH_TOKEN` or `GITHUB_TOKEN` env variables when calling releases api by @ewgenius in <https://github.com/spiceai/spiceai/pull/4175>
- Implement secure sandboxing for Docker image by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4411>
- Automatically install supported CUDA binary on Windows by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4420>
- Metrics for LLMs+ embeddings by @Jeadie in <https://github.com/spiceai/spiceai/pull/4418>
- Jeadie/25 01 17/beta perf by @Jeadie in <https://github.com/spiceai/spiceai/pull/4397>
- Pass GitHub token to all CI steps calling spice run by @ewgenius in <https://github.com/spiceai/spiceai/pull/4423>
- Run the models integration tests on PRs by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4421>
- Run CUDA builds in a separate workflow by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4430>
- Promote OpenAI models and embeddings providers to RC by @ewgenius in <https://github.com/spiceai/spiceai/pull/4432>
- Update link to retrieval-augmented generation (RAG) details by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4433>
- Unity catalog should strip parameter prefix before passing parameters to delta lake factory by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4436>
- Update quickstart traces to match current version by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4435>
- Update Supported Embeddings Providers Readme section by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4434>
- Local models can stream tools by @Jeadie in <https://github.com/spiceai/spiceai/pull/4429>
- fix: Use MetricsCollector::show() for HTTP testoperator commands by @peasee in <https://github.com/spiceai/spiceai/pull/4442>
- Fix run query action by @ewgenius in <https://github.com/spiceai/spiceai/pull/4444>
- Default to AI-enabled runtime for `spice run`/`spice install` by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4443>
- Change no spicepod.yaml log to warning by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4447>
- refactor: Update Catalog Connector error messages by @peasee in <https://github.com/spiceai/spiceai/pull/4441>
- Fix panic when converting OTel metrics by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4449>
- refactor: Update model errors by @peasee in <https://github.com/spiceai/spiceai/pull/4446>
- Update spiceai/mistral.rs to silence metadata logs by @ewgenius in <https://github.com/spiceai/spiceai/pull/4452>
- fix xAI; don't use openai defaults by @Jeadie in <https://github.com/spiceai/spiceai/pull/4450>
- Improves the UX of using huggingface models by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4451>
- Add GH Workflow to test `spice ai` runtime installation by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4448>
- fix: Use specific model errors where available by @peasee in <https://github.com/spiceai/spiceai/pull/4454>
- Detect and report unsupported embedding column type during dataset registration by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4456>
- Handle Errors by @Jeadie in <https://github.com/spiceai/spiceai/pull/4455>
- Catch and report negative openai_temperature error by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4453>
- Clarify release check error message if it is caused by wrong GH token by @ewgenius in <https://github.com/spiceai/spiceai/pull/4458>

**Full Changelog**: <https://github.com/spiceai/spiceai/compare/v1.0.0-rc.5...v1.0.0>

Resources​

Community​

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Slack or by email to get involved.

Spice v0.20-beta (Nov 4, 2024)

Β· 4 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v0.20-beta 🧩

Spice v0.20.0-beta improves federated query performance with column pruning and adds support for Metal (Apple Silicon) and CUDA (NVidia) accelerators. The S3, PostgreSQL, MySQL, and GitHub Data Connectors have graduated from Beta to Release Candidates. The Arrow, DuckDB, and SQLite Data Accelerators have graduated from Alpha to Beta.

Highlights in v0.20.0-beta​

Data Connectors: The S3, PostgreSQL, MySQL, and GitHub Data Connectors have graduated from beta to release candidate.

Data Accelerators: The Arrow, DuckDB, and SQLite Data Accelerators have graduated from alpha to beta.

Metal and CUDA Support: Added support for Metal (Apple Silicon) and CUDA (NVidia) for AI/ML workloads including embeddings and local LLM inference.

For instructions on compiling a Meta or CUDA binary, see the Installation Docs.

Breaking Changes​

  • The ODBC Data Connector now requires ODBC drivers specified in connection strings are registered in the system ODBC driver manager.

Example invalid connection string:

DRIVER={/path/to/driver.so};SERVER=localhost;DATABASE=master

Example valid connection string:

DRIVER={My ODBC Driver};SERVER=localhost;DATABASE=master

Where My ODBC Driver is the name of an ODBC driver registered in the ODBC driver manager.

Contributors​

  • @ewgenius
  • @peasee
  • @phillipleblanc
  • @sgrebnov
  • @Jeadie
  • @barracudarin
  • @Sevenannn

What's Changed​

- Update Helm for v0.19.4-beta and add release notes by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3310>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/3311>
- `metal` & `cuda` flags for spice by @Jeadie in <https://github.com/spiceai/spiceai/pull/3212>
- Promote postgres connector to RC quality by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3305>
- docs: Update ROADMAP.md by @peasee in <https://github.com/spiceai/spiceai/pull/3322>
- feat: Enable federation for in-memory accelerators by @peasee in <https://github.com/spiceai/spiceai/pull/3325>
- fix: Only allow env files from the current dir by @peasee in <https://github.com/spiceai/spiceai/pull/3327>
- Always read TimezoneTZ from PostgreSQL as UTC by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3330>
- For multi-sink acceleration refreshes, ensure parent table completes before the children. by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3329>
- Update TPC-DS Q49 (Decimal to Float) to match SQLite's type system by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3323>
- Enable parquet pushdown in Spice by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3245>
- Use spice object_store fork to fix S3 ambiguous error by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3304>
- Don't mix commented out queries for s3 connectors and accelerators by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3331>
- Allow only valid WHERE conditions in vector searches by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3335>
- fix: Allow only ODBC profiles by @peasee in <https://github.com/spiceai/spiceai/pull/3324>
- Track how many times an acceleration falls back during initialization by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3339>
- Anthropic model regex and fix tool parsing aggregation bug by @Jeadie in <https://github.com/spiceai/spiceai/pull/3334>
- Upgrade runtime along with CLI on `spice upgrade` by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3341>
- Update upcoming Roadmap by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3343>
- fix: Prevent acceleration files outside of working directory by @peasee in <https://github.com/spiceai/spiceai/pull/3340>
- Document S3 connector limitations by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3333>
- Update Object Store Patch by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3361>
- Promote SQLite Data Accelerator to Beta by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3365>
- Promote S3 connector to RC quality by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3362>
- Revert "fix: Only allow env files from the current dir" by @peasee in <https://github.com/spiceai/spiceai/pull/3368>
- docs: Fix typo for S3 release status in README.md by @peasee in <https://github.com/spiceai/spiceai/pull/3370>
- Include unnecessary columns pruning step during federated plan creation by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3363>

**Full Changelog**: <https://github.com/spiceai/spiceai/compare/v0.19.4-beta...v0.20.0-beta>

Resources​

Community​

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Slack or by email to get involved.

Spice v0.18-beta (Sep 16, 2024)

Β· 7 min read
Sergei Grebnov
Senior Software Engineer at Spice AI

Announcing the release of Spice v0.18-beta.

The v0.18.0-beta release adds new Sharepoint and File data connectors, introduces AWS Identity and Access Management (IAM) support for the S3 Data Connector, improves performance of the GitHub connector, and increases the overall reliability of all data accelerators. The /ready API endpoint was enhanced to report as ready only when all components, including loaded data, have successfully reported readiness.

Highlights in v0.18.0-beta​

Sharepoint Data Connector: Use from: sharepoint: to access and accelerate documents stored in Microsoft 365 OneDrive for Business (Sharepoint). The CLI also includes a new spice login sharepoint to aid in local development and testing.

Example spicepod.yml:

datasets:
- from: sharepoint:drive:Documents/path:/important_documents/
name: important_documents
params:
sharepoint_client_id: ${secrets:SPICE_SHAREPOINT_CLIENT_ID}
sharepoint_tenant_id: ${secrets:SPICE_SHAREPOINT_TENANT_ID}
sharepoint_client_secret: ${secrets:SPICE_SHAREPOINT_CLIENT_SECRET}

See the Sharepoint Data Connector documentation.

AWS Identity and Access Management (IAM) for S3: A new s3_auth parameter for the s3 data connector to configure the authentication method to use when connecting to S3. Supported values are public, key, and iam_role. Use s3_auth: iam_role to assume the instance IAM role.

Example spicepod.yml:

datasets:
- from: s3://my-bucket
name: bucket
params:
s3_auth: iam_role # Assume IAM role of instance

See the S3 Data Connector documentation.

File Data Connector Use from: file: to query files stored by locally accessible filesystems.

Example spicepod.yml:

datasets:
- from: file://path/to/customer.parquet
name: customer
params:
file_format: parquet

See the File Data Connector documentation.

Improved /ready Api Now includes the initial data load for accelerated datasets in addition to component readiness to ensure readiness is only reported when data has loaded and can be successfully queried.

Breaking Changes​

  • GitHub Data Connector: The data type for time-related columns has changed from Utf8 to Timestamp. To upgrade, data type references to timestamp. For example, if using time_format:, change uses of time_format: ISO8601 to time_format: timestamp.

  • Ready API: The /ready API reports ready only when all components have reported ready and data is fully loaded. To upgrade, evaluate uses of the Ready API (such as Kubernetes readiness probes) and consider how it might affect system behavior.

Dependencies​

No major dependencies updates.

Contributors​

  • @phillipleblanc
  • @Jeadie
  • @lukekim
  • @sgrebnov
  • @peasee
  • @eltociear
  • @Sevenannn
  • @ewgenius
  • @karifabri

New Contributors​

What's Changed​

- Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2585
- Set helm to v0.17.4-beta by @ewgenius in https://github.com/spiceai/spiceai/pull/2595
- Bump to next v0.18.0-beta version by @ewgenius in https://github.com/spiceai/spiceai/pull/2596
- Add snapshot test docs / Update beta criteria for data accelerators by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2594
- Enable federation for accelerated queries (sqlite, duckdb, postgres) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2598
- spelling updates on v0.17.4 release notes by @karifabri in https://github.com/spiceai/spiceai/pull/2601
- Update endgame template by @ewgenius in https://github.com/spiceai/spiceai/pull/2591
- fix: Re-attach DuckDB attachments on each query by @peasee in https://github.com/spiceai/spiceai/pull/2602
- Speed up sqlite accelerator benchmark test with indexes by @Sevenannn in https://github.com/spiceai/spiceai/pull/2597
- Fix refresh API using `refresh_mode: append` by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2609
- Tweak `/ready` to only report ready when components have all reported Ready by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2600
- Add `s3_auth` parameter to configure IAM role authentication by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2611
- Bump fundu from 2.0.0 to 2.0.1 by @dependabot in https://github.com/spiceai/spiceai/pull/2576
- fix: Remove comments from SQL files by @peasee in https://github.com/spiceai/spiceai/pull/2627
- Utilize runtime.status().is_ready() to check acceleration dataset readiness in benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2614
- Allow for prefix to be kept in internal Parameters by @Jeadie in https://github.com/spiceai/spiceai/pull/2603
- Bump itertools from 0.12.1 to 0.13.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2572
- Bump golang.org/x/mod from 0.20.0 to 0.21.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2571
- Add initial threat model using OWASP Threat Dragon by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2599
- fix: Explicitly error for duplicate duckdb file accelerators by @peasee in https://github.com/spiceai/spiceai/pull/2628
- Benchmark test binary can parse command line option by @Sevenannn in https://github.com/spiceai/spiceai/pull/2626
- Snapshot tests shouldn't crash the Spice benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2613
- Bump anyhow from 1.0.86 to 1.0.87 by @dependabot in https://github.com/spiceai/spiceai/pull/2573
- Upgrade datafusion to improve SQLite subquery tables aliasing support by @sgrebnov in https://github.com/spiceai/spiceai/pull/2634
- Run benchmark separately using workflow by @Sevenannn in https://github.com/spiceai/spiceai/pull/2631
- Sharepoint UX changes by @Jeadie in https://github.com/spiceai/spiceai/pull/2633
- Improve `/ready` to only mark a dataset ready iff the initial refresh completed by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2630
- Support relative paths for file connector by @Jeadie in https://github.com/spiceai/spiceai/pull/2637
- Fix `error decoding response body` GitHub file connector bug by @sgrebnov in https://github.com/spiceai/spiceai/pull/2645
- GraphQL pagination and robustness. by @Jeadie in https://github.com/spiceai/spiceai/pull/2632
- docs: Update bug template by @peasee in https://github.com/spiceai/spiceai/pull/2629
- Define GitHub `issues` data connector schema upfront by @sgrebnov in https://github.com/spiceai/spiceai/pull/2646
- Add support for loading from Sharepoint Group's default drive. by @Jeadie in https://github.com/spiceai/spiceai/pull/2642
- Fix typo in workflow, fix the postgres connector container readiness check by @Sevenannn in https://github.com/spiceai/spiceai/pull/2654
- Fix check all features by @Sevenannn in https://github.com/spiceai/spiceai/pull/2653
- Enable Warn/Error traces from dependency components by @sgrebnov in https://github.com/spiceai/spiceai/pull/2655
- Use lower case iso8601 for time_column by @Sevenannn in https://github.com/spiceai/spiceai/pull/2551
- Add basic integration test for Spice spill-to-disk and re-hydration scenario by @sgrebnov in https://github.com/spiceai/spiceai/pull/2643
- Add 'RefreshOverrides::max_jitter' to 'POST /v1/datasets/:name/acceleration/refresh' by @Jeadie in https://github.com/spiceai/spiceai/pull/2641
- Bump rustls-pemfile from 1.0.4 to 2.1.3 by @dependabot in https://github.com/spiceai/spiceai/pull/2575
- Update dependencies to support querying postgres enum types by @Sevenannn in https://github.com/spiceai/spiceai/pull/2657
- Upgrade table-providers by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2659
- Improve `spill_to_disk_and_rehydration` integration test by @sgrebnov in https://github.com/spiceai/spiceai/pull/2658
- Enhance GitHub connector robustness with explicit table schema definitions by @sgrebnov in https://github.com/spiceai/spiceai/pull/2661
- Rename sharepoint fields by @Jeadie in https://github.com/spiceai/spiceai/pull/2668
- Disable dataset checkpoint for DuckDB acceleration by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2676
- Revert "Enable federation for accelerated queries (sqlite, duckdb, postgres) (#2598) by @Sevenannn in https://github.com/spiceai/spiceai/pull/2683

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v0.17.4-beta...v0.18.0-beta

Resources​

Community​

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Slack or by email to get involved.

Spice.ai v0.10-alpha

Β· 2 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v0.10-alpha! πŸ§™β€β™‚οΈ

The Spice.ai v0.10-alpha release focused on additions and updates to improve stability, usability, and the overall Spice developer experience.

Highlights in v0.10-alpha​

Public Bucket Support for S3 Data Connector: The S3 Data Connector now supports public buckets in addition to buckets requiring an access id and key.

JDBC-Client Connectivity: Improved connectivity for JDBC clients, like Tableau.

User Experience Improvements:

  • Friendlier error messages across the board to make debugging and development better.
  • Added a spice login postgres command, streamlining the process for connecting to PostgreSQL databases.
  • Added PostgreSQL connection verification and connection string support, enhancing usability for PostgreSQL users.

Grafana Dashboard: Improving the ability to monitor Spice deployments, a standard Grafana dashboard is now available.

Contributors​

  • @phillipleblanc
  • @mitchdevenport
  • @Jeadie
  • @ewgenius
  • @sgrebnov
  • @y-f-u
  • @lukekim
  • @digadeesh

New in this release​

  • Fixes Gracefully handle Arrow Flight DoExchange connection resets
  • Adds Grafana Dashboard
  • Adds Flight SQL CommandGetTableTypes Command support (improves JDBC-client connectivity)
  • Adds Friendlier error messages
  • Adds spice login postgres command
  • Adds PostgreSQL connection verification
  • Adds PostgreSQL connection string support
  • Adds Linux aarch64 build
  • Updates Improves spice status with dataset metrics
  • Updates CLI REPL improved show tables output
  • Updates CLI REPL limit output to 500 rows
  • Updates Improved README.md with architecture diagram updates
  • Updates Improved CI run time.
  • Updates Use macOS hosted Actions runner

Resources​

Community​

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Slack or by email to get involved.