Skip to main content
Luke Kim
Founder and CEO of Spice AI
View all authors

Announcing Spice.ai Open Source 1.0-stable: A Portable Compute Engine for Data-Grounded AI — Now Ready for Production

· 12 min read
Luke Kim
Founder and CEO of Spice AI

🎉 Today marks the 1.0-stable release of Spice.ai Open Source—purpose-built to help enterprises ground AI in data. By unifying federated data query, retrieval, and AI inference into a single engine, Spice mitigates AI hallucinations, accelerates data access for mission-critical workloads, and makes it simple and easy for developers to build fast and accurate data-intensive applications across cloud, edge, or on-prem.

Spice.ai Open Source

Enterprise AI systems are only as good as the context they’re provided. When data is inaccessible, incomplete, or outdated, even the most advanced models can generate outputs that are inaccurate, misleading, or worse, potentially harmful. In one example, a chatbot was tricked into selling a 2024 Chevy Tahoe for $1 due to a lack of contextual safeguards. For enterprises, errors like these are unacceptable—it’s the difference between success and failure.

Retrieval-Augmented Generation (RAG) is part of the answer — but traditional RAG is only as good as the data it has access to. If data is locked away in disparate, often legacy data systems, or cannot be stitched together for accurate retrieval, you get, as Benioff puts it, "Clippy 2.0".

Benioff Post

And often, after initial Python-scripted pilots, you’re left with a new set of problems: How do you deploy AI that meets enterprise requirements for performance, security, and compliance while being cost efficient? Directly querying large datasets for retrieval is slow and expensive. Building and maintaining complex ETL pipelines requires expensive data teams that most organizations don’t have. And because enterprise data is highly sensitive, you need secure access and auditable observability—something many RAG setups don’t even consider.

Developers need a platform at the intersection of data and AI—one specifically designed to ground AI in data. A solution that unifies data query, search, retrieval, and model inference—ensuring performance, security, and accuracy so you can build AI that you and your customers can trust.

Spice.ai OSS: A portable data, AI, and retrieval engine​

In March of 2024, we introduced Spice.ai Open Source, a SQL query engine to materialize and accelerate data from any database, data warehouse, or data lake so that data can be accessed wherever it lives across the enterprise — consistently fast. But that was only the start.

Building on this foundation, Spice.ai OSS unifies data, retrieval, and AI, to provide current, relevant context to mitigate AI “hallucinations” and significantly reduce incorrect outputs-just one of the many mission-critical use cases Spice.ai addresses.

Spice is a portable, single-node, compute engine built in Rust. It embeds the fastest single-node SQL query engine, DataFusion, to serve secure, virtualized data views to data-intensive apps, AI, and agents. Sub-second data query is accelerated locally using Apache Arrow, DuckDB, or SQLite.

Now at version 1.0-stable, Spice is ready for production. It’s already deployed in enterprise use at Twilio, Barracuda Networks, and NRC Health, and can be deployed anywhere—cloud-hosted, BYOC, edge, on-prem.

A diagram of the Spice.ai OSS compute-engine

Data-grounded AI​

Data-grounded AI anchors models in accurate, current, and domain-specific data, rather than relying solely on pre-trained knowledge. By unifying enterprise data—across databases, data lakes, and APIs—and applying advanced ingestion and retrieval techniques, these systems dynamically incorporate real-world context at inference time without leaking sensitive information. This approach helps developers minimize hallucinations, reduce operational risk, and build trust in AI by delivering reliable, relevant outputs.

Comparison of AI without contextual data and with data-grounded AI

How does Spice.ai OSS solve data-grounding?

With Spice, models always have access to materializations of low-latency, real-time data for near-instant retrieval, minimizing data movement while enabling AI feedback so apps and agents can learn and adapt over time. For example, you can join customer records from PostgreSQL with sales data in Snowflake and logs stored in S3—all with a single SQL query or LLM function call.

Secure Compute Engine for Data-grounded AI

Spice includes an advanced suite of LLM tools including vector and hybrid search, text-to-SQL, SQL query and retrieval, data sampling, and context formatting—all purpose-built for accurate outputs.

The latest research is continually incorporated so that teams can focus on business objectives rather than trying to keep up with the incredibly fast-moving and often overwhelming space of AI.

Spice.ai OSS: The engine that makes AI work​

Spice.ai OSS is a lightweight, portable runtime (single ~140 MB binary) with the capabilities of a high-speed cloud data warehouse built into a self-hostable AI inference engine, all in a single, run-anywhere package.

It's designed to be distributed and integrated at the application level, rather than being a bulky, centralized system to manage, and is often deployed as a sidecar. Whether running one Spice instance per service or one for each customer, Spice is flexible enough to fit your application architecture.

Apps and agents integrate with Spice.ai OSS via three industry-standard APIs, so that it can be adopted incrementally with minimal changes to applications.

  1. SQL Query APIs: HTTP, Arrow Flight, Arrow Flight SQL, ODBC, JDBC, and ADBC.

  2. OpenAI-Compatible APIs: HTTP APIs compatible with the OpenAI SDK, AI SDK with local model serving (CUDA/Metal accelerated), and gateway to hosted models.

  3. Iceberg Catalog REST APIs: A unified Iceberg Catalog REST API.

Spice.ai OSS architecture

Key features of Spice.ai OSS include:​

  • Federated SQL Query Across Data Sources: Perform SQL queries across disparate data sources with over 25 open-source data connectors, including catalogs (Unity Catalog, Iceberg Catalog, etc), databases (PostgreSQL, MySQL, etc.), data warehouses (Snowflake, Databricks, etc.), and data lakes (e.g., S3, ABFS, MinIO, etc.).

  • Data Materialization and Acceleration: Locally materialize and accelerate data using Arrow, DuckDB, SQLite, and PostgreSQL, enabling low-latency and high-speed transactional and analytical queries. Data can be ingested via Change-Data-Capture (CDC) using Debezium, Catalog integrations, on an interval, or by trigger.

  • AI Inference, Gateway, and LLM toolset: Load and serve models like Llama3 locally, or use Spice as a gateway to hosted AI platforms including OpenAI, Anthropic, xAI, and NVidia NIM. Automatically use a purpose-built LLM toolset for data-grounded AI.

  • Enterprise Search and Retrieval: Advanced search capabilities for LLM applications, including vector-based similarity search and hybrid search across structured and unstructured data. Real-time retrieval grounds AI applications in dynamic, contextually relevant information, enabling state-of-the-art RAG.

  • LLM Memory: Enable long-term memory for LLMs by efficiently storing, retrieving, and updating context across interactions. Support real-time contextual continuity and grounding for applications that require persistent and evolving understanding.

  • LLM Evaluations: Test and boost model reliability and accuracy with integrated LLM-powered evaluation tools to assess and refine AI outputs against business objectives and user expectations.

  • Monitoring and Observability: Ensure operational excellence with telemetry, distributed tracing, query/task history, and metrics, that provide end-to-end visibility into data flows and model performance in production.

  • Deploy Anywhere; Edge-to-Cloud Flexibility: Deploy Spice as a standalone instance, Kubernetes sidecar, microservice, or scalable cluster, with the flexibility to run distributed across edge, on-premises, or any cloud environment. Spice AI offers managed, cloud-hosted deployments of Spice.ai OSS through the Spice Cloud Platform (SCP).

Real-world use-cases​

Spice delivers data readiness for teams like Twilio and Barracuda, and accelerates time-to-market of data-grounded AI, such as with developers on GitHub and at NRC Health.

Here are some examples of how Spice.ai OSS solves real problems for these teams.


Twilio Logo

CDN for Databases — Twilio​

Twilio datalake and database acceleration using Spice.ai Open Source

A core requirement for many applications is consistently fast data access, with or without AI. Twilio uses Spice.ai OSS as a data acceleration framework or Database CDN, staging data in object-storage that's accelerated with Spice for sub-second query to improve the reliability of critical services in its messaging pipelines. Before Spice, a database outage could result in a service outage.

"Spice opened the door to take these critical control-plane datasets and move them next to our services in the runtime path."

Peter Janovsky's profile
Peter Janovsky
Software Architect at Twilio

With Spice, Twilio has achieved:

  • Significantly Improved Query Performance: Used Spice to co-locate control-plane data in the messaging runtime, accelerated with DuckDB, to send messages with a P99 query time of < 5ms.

  • Low-Latency Multi-Tenancy Controls: Spice is integrated into the message-sending runtime to manage multi-tenancy data controls. Before, data changes required manual triggers and took hours to propagate. Now, they update automatically and reach the messaging front door within five minutes via a resilient data-availability framework.

  • Mission-Critical Reliability: Reduced reliance on queries to databases by using Spice to accelerate data in-memory locally, with automatic failover to query data directly from S3, ensuring uninterrupted service even during database downtime.

"With a simple drop in container, we are able to double our data redundancy by using Spice."

David Blum's profile
David Blum
Principal Software Engineer at Twilio

By adopting Spice.ai OSS, Twilio strengthened its infrastructure, ensuring reliable services for customers and scalable data access across its growing platform.


Barracuda Logo

Datalake Accelerator — Barracuda​

Barracuda Delta Lake acceleration using Spice.ai Open Source

Barracuda uses Spice.ai OSS to modernize data access for their email archiving and audit log systems, solving two big problems: slow query performance and costly queries. Before Spice, customers experienced frustrating delays of up to two minutes when searching email archives, due to the data volume being queried.

"It's just a huge gain in responsiveness for the customer."

David Stancu's profile
David Stancu
Senior Principal Software Engineer at Barracuda

With Spice, Barracuda has achieved:

  • 100x Query Performance Improvement: Accelerated email archive queries from a P99 time of 2 minutes to 100-200 milliseconds.

  • Efficient Audit Logs: Offloaded audit logs to Parquet files in S3, queried directly by Spice.

  • Mission-Critical Reliability: Reduced load on Cassandra, improving overall infrastructure stability.

  • Significant Cost Reduction: Replaced expensive Databricks Spark queries, significantly cutting expenses while improving performance.

It just kinda spins up and it just works, which is really nice.

Darin Douglass's profile
Darin Douglass
Principal Software Engineer at Barracuda

NRC Health Logo

Data-Grounded AI apps and agents — NRC Health​

NRC Health data-grounded AI using Spice.ai Open Source

NRC Health uses Spice.ai OSS to simplify and accelerate the development of data-grounded AI features, unifying data from multiple platforms including MySQL, SharePoint, and Salesforce, into secure, AI-ready data. Before Spice, scaling AI expertise across the organization to build complex RAG-based scenarios was a challenge.

"What I like the most about Spice, it's very easy to collect data from different data sources and I am able to chat with this data and do everything in one place."

Dustin Warner's profile
Dustin Warner
Director of Software Engineering at NRC Health

With Spice OSS, NRC Health has achieved:

  • Developer Productivity: Partnered with Spice in three company-wide AI hackathons to build complete end-to-end data-grounded AI features in hours instead of weeks or months.

  • Accelerated Time-to-Market: Centralized data integration and AI model serving an enterprise-ready service, accelerating time to market.

"I explored AI, embeddings, search algorithms, and features with our own database. I read a lot about this, but it was so much easier to use Spice than doing it from scratch."

Taher Ahmed's profile
Taher Ahmed
Software Engineering Manager at NRC Health

Data-Grounded AI Software Development — Spice.ai GitHub Copilot Extension​

When using tools like GitHub Copilot, developers often face the hassle of switching between multiple environments to get the data they need.

The Spice.ai for GitHub Copilot Extension built on Spice.ai OSS, gives developers the ability to connect data from external sources to Copilot, grounding Copilot in relevant data not generally available in GitHub, like test data stored in a development database.

Developers can simply type @spiceai to interact with connected data, with relevant answers now surfaced directly in Copilot Chat, significantly improving productivity.

Why choose Spice.ai OSS?​

Adopting Spice.ai OSS addresses real challenges in modern AI development: it grounds models in accurate, domain-specific, real-time data. With Spice, engineering teams can focus on what matters—delivering innovative, accurate, AI-powered applications and agents that work. Additionally, Spice.ai OSS is open-source under Apache 2.0, ensuring transparency and extensibility so your organization remains free to innovate without vendor lock-in.

Get started in 30 seconds​

You can install Spice.ai OSS in less than a minute, on macOS, Linux, and Windows:

curl https://install.spiceai.org | /bin/bash

Or using brew:

brew install spiceai/spiceai/spice

Once installed, follow the Getting Started with Spice.ai guide to ground OpenAI chat with data from S3 in less than 2 minutes.

Looking ahead​

The 1.0-stable release of Spice.ai OSS marks a major step toward accurate AI for developers. By combining data, AI, and retrieval into a unified runtime, Spice anchors AI in relevant, real-time data—helping you build apps and agents that work.

A cloud-hosted, fully managed Spice.ai OSS service is available in the Spice Cloud Platform. It’s SOC 2 Type II compliant and makes it easy to operate Spice deployments.

Beyond apps and agents, the vision for Spice is to be the best digital labor platform for building autonomous AI employees and teams. These are exciting times! Stay tuned for some upcoming announcements later in 2025!

The Spice AI Team

Learn more​

  • Cookbook: 47+ samples and examples using Spice.ai OSS
  • Documentation: Learn about features, use cases, and advanced configurations
  • X: Follow @spice_ai on X for news and updates
  • Discord: Connect with the team and the community
  • GitHub: Star the repo, contribute, and raise issues

Spice v1.0-rc.3 (Dec 30, 2024)

· 7 min read
Luke Kim
Founder and CEO of Spice AI

Announcing the release of Spice v1.0-rc.3 🧊

Spice v1.0.0-rc.3 is the third release candidate for the first major version of Spice.ai OSS. This release continues the focus on production readiness and includes new Iceberg Catalog APIs, DuckDB improvements, and a new Iceberg Catalog Connector.

Highlights in v1.0-rc.3​

  • Iceberg Catalog APIs: Spice now functions as an Iceberg Catalog provider, implementing a core subset of the Iceberg Catalog APIs. This enables Iceberg Catalog clients native discovery of datasets and schemas through Spice APIs.

  • GET /v1/namespaces - List all catalogs registered in Spice.

  • GET /v1/namespaces?parent=catalog - List schemas registered under a given catalog.

  • GET /v1/namespaces/:catalog_schema/tables - List tables registered under a given schema.

  • GET /v1/namespaces/:catalog_schema/tables/:table - Get the schema of a given table.

  • Iceberg Catalog Connector: The Iceberg Catalog Connector is a new integration to discover and query datasets from a remote Iceberg Catalog.

Example connecting to a remote Iceberg Catalog with tables stored in S3:

catalogs:
- from: iceberg:https://my-iceberg-catalog.com/v1/namespaces
name: ice
params:
iceberg_s3_access_key_id: ${secrets:ICEBERG_S3_ACCESS_KEY_ID}
iceberg_s3_secret_access_key: ${secrets:ICEBERG_S3_SECRET_ACCESS_KEY}
iceberg_s3_region: us-east-1

View the Iceberg Catalog Connector documentation for more details.

  • DuckDB Improvements: Added cosine_distance support for DuckDB-backed vector search, improved unnest nested type handling for array_element and lists, and optimized query performance.

  • SQLite Data Accelerator: Graduated to Release Candidate (RC).

  • File Data Accelerator: Graduated to Release Candidate (RC).

Breaking changes​

  • API:v1/datasets/sample has been removed as it is not particularly useful, can be replicated via SQL, and via the tools endpoint POST v1/tools/:name.

Cookbook​

  • New Language Model Evals Recipe showing how to measure the performance of a language model using LLM-as-Judge, configured entirely in the spice runtime.

  • New Iceberg Catalog Recipe showing how to use Spice to query Iceberg tables from an Iceberg catalog.

Dependencies​

  • OpenTelemetry: Upgraded from 0.26.0 to 0.27.1
  • Go: Upgraded from 1.22 to 1.23 (CLI)

Contributors​

  • @sgrebnov
  • @phillipleblanc
  • @peasee
  • @Jeadie
  • @Sevenannn
  • @lukekim
  • @ewgenius

What's Changed​

- Add CI configuration for search benchmark dataset access by @sgrebnov in https://github.com/spiceai/spiceai/pull/3888
- Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/3895
- Upgrade dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3896
- chore: Update helm chart for RC.2 by @peasee in https://github.com/spiceai/spiceai/pull/3899
- Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/3903
- chore: Update MacOS test release install to macos-13 by @peasee in https://github.com/spiceai/spiceai/pull/3901
- Add usage to `spice chat` and fix `v1/models?status=true`. by @Jeadie in https://github.com/spiceai/spiceai/pull/3898
- chore: Bump versions for rc3 by @peasee in https://github.com/spiceai/spiceai/pull/3902
- docs: Update endgame with a step to verify dependencies in release notes by @peasee in https://github.com/spiceai/spiceai/pull/3897
- Ensure eval dataset input and ouput of correct length by @Jeadie in https://github.com/spiceai/spiceai/pull/3900
- `spice add/connect/dataset configure` should update spicepod, not overwrite it & upgrade to Go 1.23 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3905
- Bump opentelemetry from 0.26.0 to 0.27.1 by @dependabot in https://github.com/spiceai/spiceai/pull/3879
- Ensure trace_id is overridden for prior written spans by @Jeadie in https://github.com/spiceai/spiceai/pull/3906
- add 'role': 'assistant' for local models by @Jeadie in https://github.com/spiceai/spiceai/pull/3910
- Run tpcds benchmark for file connector by @Sevenannn in https://github.com/spiceai/spiceai/pull/3924
- Update to reference cookbook instead of quickstarts/samples by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3928
- Fix/remove flaky integration tests by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3930
- Implement `/v1/iceberg/namespaces` & `/v1/iceberg/config` APIs by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3923
- Add script for creating tpcds parquet files and spicepod for file connector by @Sevenannn in https://github.com/spiceai/spiceai/pull/3931
- Use `utoipa` to generate openapi.json and swagger for dev by @Jeadie in https://github.com/spiceai/spiceai/pull/3927
- `fuzzy_match`, `json_match`, `includes` scorer by @Jeadie in https://github.com/spiceai/spiceai/pull/3926
- Implement `/v1/iceberg/namespaces/:namespace` by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3933
- Implement `GET /v1/iceberg/namespaces/:namespace/tables` API by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3934
- Add custom Spice DuckDB dialect with cosine_distance support by @sgrebnov in https://github.com/spiceai/spiceai/pull/3938
- Fix NSQL error: `all columns in a record batch must have the same length` by @sgrebnov in https://github.com/spiceai/spiceai/pull/3947
- Don't include tools use in hf test model by @Jeadie in https://github.com/spiceai/spiceai/pull/3955
- Implement `GET /v1/namespaces/{namespace}/tables/{table}` API by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3940
- Update dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3967
- DuckDB: add support for nested types in Lists by @sgrebnov in https://github.com/spiceai/spiceai/pull/3961
- Add script to set up clickbench for file connector by @Sevenannn in https://github.com/spiceai/spiceai/pull/3945
- docs: Add connector stable criteria by @peasee in https://github.com/spiceai/spiceai/pull/3908
- Update Roadmp Dec 23, 2024 by @lukekim in https://github.com/spiceai/spiceai/pull/3978
- Improve CI testing for OpenAPI, new tool `spiceschema`, fix broken OpenAPI stuff. by @Jeadie in https://github.com/spiceai/spiceai/pull/3948
- remove `v1/datasets/sample` by @Jeadie in https://github.com/spiceai/spiceai/pull/3981
- feat: add SQLite ClickBench benchmark by @peasee in https://github.com/spiceai/spiceai/pull/3975
- Remove feature 'llms/mistralrs' by @Jeadie in https://github.com/spiceai/spiceai/pull/3984
- Add support for 'params.spice_tools: nsql' by @Jeadie in https://github.com/spiceai/spiceai/pull/3985
- Fix integration tests - add missing `format` query parameter in /v1/status requests by @ewgenius in https://github.com/spiceai/spiceai/pull/3989
- Enhance AI tools sampling logic for robust handling of large fields by @sgrebnov in https://github.com/spiceai/spiceai/pull/3959
- Fix subquery federation by @Sevenannn in https://github.com/spiceai/spiceai/pull/3991
- Fix unnest and add DuckDB support for `array_element` by @sgrebnov in https://github.com/spiceai/spiceai/pull/3995
- Add score value snapshotting to vector similarity search tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/3996
- Use Llama-3.2-3B-Instruct for Hugging Face integration testing by @sgrebnov in https://github.com/spiceai/spiceai/pull/3992
- Simplify `construct_chunk_query_sql` for DuckDB compatibility by @sgrebnov in https://github.com/spiceai/spiceai/pull/3988
- Update TPCH and TPCDS benchmarks for spice.ai connector by @ewgenius in https://github.com/spiceai/spiceai/pull/3982
- Correctly pass Hugging Face token in models integration tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/3997
- Fix: `on_zero_results` causes `TransactionContext Error: Catalog write-write conflict on create with "attachment_0"` by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3998
- Add DuckDB acceleration to search benchmarks by @sgrebnov in https://github.com/spiceai/spiceai/pull/4000
- Enable Postgres write via non-default `postgres-write` feature flag by @sgrebnov in https://github.com/spiceai/spiceai/pull/4004
- Allow search benchmark to write test results by @sgrebnov in https://github.com/spiceai/spiceai/pull/4008
- Make Flight DoPut atomic and commit write only on successful stream completion by @sgrebnov in https://github.com/spiceai/spiceai/pull/4002
- Create a `CatalogConnector` abstraction by @phillipleblanc in https://github.com/spiceai/spiceai/pull/4003
- Fix `generate-openapi.yml` and add `.schema/openapi.json`. by @Jeadie in https://github.com/spiceai/spiceai/pull/3983
- Enable spice.ai tpcds bench workflow. Comment failing tpch queries. by @ewgenius in https://github.com/spiceai/spiceai/pull/4001
- feat: Add SQLite ClickBench overrides by @peasee in https://github.com/spiceai/spiceai/pull/4016
- Implement Iceberg Catalog Connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/4053
- feat: Datafusion updates for SQLite fixes and release by @peasee in https://github.com/spiceai/spiceai/pull/4054
- docs: Add accelerator stable release criteria by @peasee in https://github.com/spiceai/spiceai/pull/4017
- Add dremio tpch / tpcds benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/4063
- Update docs, and make PR to `spiceai/docs` for new `openapi.json`. by @Jeadie in https://github.com/spiceai/spiceai/pull/4019
- Update openapi.json by @github-actions in https://github.com/spiceai/spiceai/pull/4065
- Fix dremio subquery rewrite by @Sevenannn in https://github.com/spiceai/spiceai/pull/4064
- Update generate-openapi.yml by @Jeadie in https://github.com/spiceai/spiceai/pull/4073
- docs: Add catalog criteria by @peasee in https://github.com/spiceai/spiceai/pull/4052
- fix `distinct_columns` in auto/nsql tool groups by @Jeadie in https://github.com/spiceai/spiceai/pull/4074
- Update openapi.json by @github-actions in https://github.com/spiceai/spiceai/pull/4075
- Update openapi.json by @github-actions in https://github.com/spiceai/spiceai/pull/4076
- Implement window_func_support_window_frame from DremioDialect by @Sevenannn in https://github.com/spiceai/spiceai/pull/4012
- Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/4079
- Promote file connector to rc by @Sevenannn in https://github.com/spiceai/spiceai/pull/4080
- Add Iceberg to README by @phillipleblanc in https://github.com/spiceai/spiceai/pull/4085
- Fix '/v1/status' default format by @Jeadie in https://github.com/spiceai/spiceai/pull/4081

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v1.0.0-rc.2...v1.0.0-rc.3

Resources​

Community​

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.19.1-beta (Oct 14, 2024)

· 4 min read
Luke Kim
Founder and CEO of Spice AI

Announcing the release of Spice v0.19.1-beta 🔥

Spice v0.19.1 brings further performance and stability improvements to data connectors, including improved query push-down for file-based connectors (s3, abfs, file, ftp, sftp) that use Hive-style partitioning.

Highlights in v0.19.1​

TPC-H and TPC-DS Coverage: Expanded coverage for TPC-H and TPC-DS benchmarking suites across accelerators and connectors.

GitHub Connector Array Filter: The GitHub connector now supports filter push down for the array_contains function in SQL queries using search query mode.

NSQL CLI Command: A new spice nsql CLI command has been added to easily query datasets with natural language from the command line.

Breaking changes​

None

Contributors​

  • @peasee
  • @Sevenannn
  • @sgrebnov
  • @karifabri
  • @phillipleblanc
  • @lukekim
  • @Jeadie
  • @slyons

Dependencies​

What's Changed​

- release: Update helm chart for v0.19.0-beta by @peasee in https://github.com/spiceai/spiceai/pull/3024
- Set fail-fast = true for benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2997
- release: Update next version and ROADMAP by @peasee in https://github.com/spiceai/spiceai/pull/3033
- Verify TPCH benchmark query results for Spark connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2993
- feat: Add x-spice-user-agent header to Spice REPL by @peasee in https://github.com/spiceai/spiceai/pull/2979
- Update to object store file formats documentation link by @karifabri in https://github.com/spiceai/spiceai/pull/3036
- Use teraswitch-runners for Linux x64 workflows + builds by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3042
- feat: Support array contains in GitHub pushdown by @peasee in https://github.com/spiceai/spiceai/pull/2983
- Bump text-splitter from 0.16.1 to 0.17.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2987
- Revert integration tests back to hosted runner by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3046
- Tune Github runner resources to allow in memory TPCDS benchmark to run by @Sevenannn in https://github.com/spiceai/spiceai/pull/3025
- fix: add winver by @peasee in https://github.com/spiceai/spiceai/pull/3054
- refactor: Use is modifier for checking GitHub state filter by @peasee in https://github.com/spiceai/spiceai/pull/3056
- Enable `merge_group` checks for PR workflows by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3058
- Fix issues with merge group by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3059
- Validate in-memory arrow accelertion TPCDS result correctness by @Sevenannn in https://github.com/spiceai/spiceai/pull/3044
- Fix rev parsing for PR checks by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3060
- Use 'Accept' header for `/v1/sql/` and `/v1/nsql` by @Jeadie in https://github.com/spiceai/spiceai/pull/3032
- Verify Postgres acceleration TPCDS result correctness by @Sevenannn in https://github.com/spiceai/spiceai/pull/3043
- Add NSQL CLI REPL command by @lukekim in https://github.com/spiceai/spiceai/pull/2856
- Preserve query results order and add TPCH benchmark results verification for duckdb:file mode by @sgrebnov in https://github.com/spiceai/spiceai/pull/3034
- Refactor benchmark to include MySQL tpcds bench, tweaks to makefile target for generating mysql tpcds data by @Sevenannn in https://github.com/spiceai/spiceai/pull/2967
- Support runtime parameter for `sql_query_keep_partition_by_columns` & enable by default by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3065
- Document TPC-DS limitations: `EXCEPT`, `INTERSECT`, duplicate names by @sgrebnov in https://github.com/spiceai/spiceai/pull/3069
- Adding ABFS benchmark by @slyons in https://github.com/spiceai/spiceai/pull/3062
- Add support for GitHub app installation auth for GitHub connector by @ewgenius in https://github.com/spiceai/spiceai/pull/3063
- docs: Document stack overflow workaround, add helper script by @peasee in https://github.com/spiceai/spiceai/pull/3070
- Tune MySQL TPCDS image to allow for successful benchmark test run by @Sevenannn in https://github.com/spiceai/spiceai/pull/3067
- Automatically infer partitions for hive-style partitioned files for object store based connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3073
- Support `hf_token` from params/secrets by @Jeadie in https://github.com/spiceai/spiceai/pull/3071
- Inherit embedding columns from source, when available. by @Jeadie in https://github.com/spiceai/spiceai/pull/3045
- Validate identifiers for component names by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3079
- docs: Add workaround for TPC-DS Q97 in MySQL by @peasee in https://github.com/spiceai/spiceai/pull/3080
- Document TPC-DS Postgres column alias in a CASE statement limitation by @sgrebnov in https://github.com/spiceai/spiceai/pull/3083
- Update plan snapshots for TPC-H bench queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/3088
- Update Datafusion crate to include recent unparsing fixes by @sgrebnov in https://github.com/spiceai/spiceai/pull/3089
- Sample SQL table data tool and API by @Jeadie in https://github.com/spiceai/spiceai/pull/3081
- chore: Update datafusion-table-providers by @peasee in https://github.com/spiceai/spiceai/pull/3090
- Add `hive_infer_partitions` to remaining object store connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3086
- deps: Update datafusion-table-providers by @peasee in https://github.com/spiceai/spiceai/pull/3093
- For local embedding models, return usage input tokens. by @Jeadie in https://github.com/spiceai/spiceai/pull/3095
- Update end_game.md with Accelerator/Connector criteria check by @slyons in https://github.com/spiceai/spiceai/pull/3092
- Update TPC-DS Q90 by @sgrebnov in https://github.com/spiceai/spiceai/pull/3094
- docs: Add RC connector criteria by @peasee in https://github.com/spiceai/spiceai/pull/3026
- Update version to 0.19.1-beta by @sgrebnov in https://github.com/spiceai/spiceai/pull/3101

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v0.19.0-beta...v0.19.1-beta

Resources​

Community​

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.17.4-beta (Sep 9, 2024)

· 4 min read
Luke Kim
Founder and CEO of Spice AI

Announcing the release of Spice v0.17.4-beta.

The v0.17.4-beta release adds compatibility, performance, and reliability improvements to the DuckDB and SQLite accelerators. The GitHub data connector adds a Stargazers table, Snowflake and Clickhouse data connectors have improved resiliency for empty tables, and core data processing and quality has been improved.

Highlights in v0.17.4-beta​

Improved benchmarking, testing, and robustness of data accelerators: Continued compatibility, performance, and reliability improvements for SQLite and DuckDB data accelerators and expanded performance and quality testing.

GitHub Stargazers: The GitHub Data Connector adds support for a /stargazers table making it easy to query GitHub Stargazers using SQL!

Breaking Changes​

None.

Contributors​

  • @phillipleblanc
  • @Jeadie
  • @lukekim
  • @sgrebnov
  • @peasee
  • @eltociear
  • @Sevenannn
  • @ewgenius

New Contributors​

What's Changed​

- Change to sql lang by @ewgenius in https://github.com/spiceai/spiceai/pull/2484
- Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/2487
- Bump rustyline from 13.0.0 to 14.0.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2473
- Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2490
- Native schema inference for snowflake (and support timezone_tz, better numeric support) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2493
- Add checks for GitHub quickstart and docs banner to endgame template by @ewgenius in https://github.com/spiceai/spiceai/pull/2489
- Prepare for v0.18.0-beta by @Jeadie in https://github.com/spiceai/spiceai/pull/2488
- Add logo to README.md by @lukekim in https://github.com/spiceai/spiceai/pull/2497
- Add stargazers to GitHub data connector by @lukekim in https://github.com/spiceai/spiceai/pull/2502
- Enable federation for accelerated queries (sqlite and duckdb) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2511
- Load SQLite decimal extension by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2498
- fix: Support INTERVAL in SQLite by @peasee in https://github.com/spiceai/spiceai/pull/2513
- Add refresh jitter to refreshing dataset acceleration by @Jeadie in https://github.com/spiceai/spiceai/pull/2510
- Update to use DuckDB streaming by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2514
- Add more MySQL types in E2E testing by @sgrebnov in https://github.com/spiceai/spiceai/pull/2512
- Update tpc loading script to support automatic loading into postgres by @Sevenannn in https://github.com/spiceai/spiceai/pull/2509
- docs: update README.md by @eltociear in https://github.com/spiceai/spiceai/pull/2516
- Bump quinn-proto from 0.11.6 to 0.11.8 in the cargo group by @dependabot in https://github.com/spiceai/spiceai/pull/2501
- Script for loading clickbench data into arrow / postgres and run clickbench queries by @Sevenannn in https://github.com/spiceai/spiceai/pull/2500
- Fix run query script to correctly record all errors by @Sevenannn in https://github.com/spiceai/spiceai/pull/2529
- Add support for DuckDB engine to setup-tpc-spicepod.bash by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2530
- Upgrade `datafusion` (fixes subquery alias table unparsing for SQLite) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2532
- Make dataset acceleration delay `period +- jitter` by @Jeadie in https://github.com/spiceai/spiceai/pull/2534
- Add refresh options to `POST /v1/datasets/:name/acceleration/refresh` by @Jeadie in https://github.com/spiceai/spiceai/pull/2515
- Add E2E for GitHub Connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2505
- Add on-conflict integration test for file based and memory based sqlite by @Sevenannn in https://github.com/spiceai/spiceai/pull/2533
- Upgrade to Rust v.1.81.0 and fix resulting compile error by @Sevenannn in https://github.com/spiceai/spiceai/pull/2539
- Remove unneeded `RwLock` from `EmbeddingModelStore` by @Jeadie in https://github.com/spiceai/spiceai/pull/2541
- Remove unneeded RwLock from LlmModelStore by @Jeadie in https://github.com/spiceai/spiceai/pull/2537
- Add sqlite to the setup tpc benchmark script by @Sevenannn in https://github.com/spiceai/spiceai/pull/2540
- Add sqlite to setup clickbench script by @Sevenannn in https://github.com/spiceai/spiceai/pull/2548
- Update version for v0.17.4-beta release by @ewgenius in https://github.com/spiceai/spiceai/pull/2563
- Sharepoint data connector by @Jeadie in https://github.com/spiceai/spiceai/pull/2294
- Fix predicate/projection push-down for BytesProcessedNode by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2564
- fix out of order projections in sharepoint scans by @Jeadie in https://github.com/spiceai/spiceai/pull/2569
- Use Decimal instead of Float64 for SQLite Decimal columns by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2566
- Add snapshot tests for EXPLAIN plans in integration tests by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2570
- Set refresh.period from `refresh_data_window` by @ewgenius in https://github.com/spiceai/spiceai/pull/2578
- Add snapshot tests for EXPLAIN plans in benchmark tests by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2580
- Disable federation for accelerated queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/2581
- Add manual refresh payload to 'spice refresh...' by @Jeadie in https://github.com/spiceai/spiceai/pull/2565
- Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/2586

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v0.17.3-beta...v0.17.4-beta

Resources​

Community​

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.16-alpha (July 22, 2024)

· 7 min read
Luke Kim
Founder and CEO of Spice AI

The v0.16-alpha release is the first candidate release for the beta milestone on a path to finalizing the v1.0 developer and user experience. Upgraders should be aware of several breaking changes designed to improve the Secrets configuration experience and to make authoring spicepod.yml files more consistent. See the [Breaking Changes](#Breaking Changes) section below for details. Additionally, the Spice Java SDK was released, providing Java developers a simple but powerful native experience to query Spice.

Highlights in v0.16-alpha​

secrets:
- from: env
name: env
- from: aws_secrets_manager:my_secret_name
name: aws_secret

Secrets managed by configured Secret Stores can be referenced in component params using the syntax ${<store_name>:<key>}. E.g.

datasets:
- from: postgres:my_table
name: my_table
params:
pg_host: localhost
pg_port: 5432
pg_pass: ${ env:MY_PG_PASS }
  • Java Client SDK: The Spice Java SDK has been released for JDK 17 or greater.

  • Federated SQL Query: Significant stability and reliability improvements have been made to federated SQL query support in most data connectors.

  • ODBC Data Connector: Providing a specific SQL dialect to query ODBC data sources is now supported using the sql_dialect param. For example, when querying Databricks using ODBC, the databricks dialect can be specified to ensure compatibility. Read the ODBC Data Connector documentation for more details.

Breaking Changes​

  • Secret Stores: Secret Stores support has been overhauled including required changes to spicepod.yml schema. File based secrets stored in the ~/.spice/auth file are no longer supported. See Secret Stores Documentation for full reference.

To upgrade Secret Stores, rename any parameters ending in _key to remove the _key suffix and specify a secret inline via the secret replacement syntax (${<secret_store>:<key>}):

datasets:
- from: postgres:my_table
name: my_table
params:
pg_host: localhost
pg_port: 5432
pg_pass_key: my_pg_pass

to:

datasets:
- from: postgres:my_table
name: my_table
params:
pg_host: localhost
pg_port: 5432
pg_pass: ${secrets:my_pg_pass}

And ensure the MY_PG_PASS environment variable is set.

  • Datasets: The default value of time_format has changed from unix_seconds to timestamp.

To upgrade:

datasets:
- from:
name: my_dataset
# Explicitly define format when not specified.
time_format: unix_seconds
  • HTTP Port: The default HTTP port has changed from port 3000 to port 8090 to avoid conflicting with frontend apps which typically use the 3000 range. If an SDK is used, upgrade it at the same time as the runtime.

To upgrade and continue using port 3000, run spiced with the --http command line argument:

# Using Dockerfile or spiced directly
spiced --http 127.0.0.1:3000
  • HTTP Metrics Port: The default HTTP Metrics port has changed from port 9000 to 9090 to avoid conflicting with other metrics protocols which typically use port 9000.

To upgrade and continue using port 9000, run spiced with the metrics command line argument:

# Using Dockerfile or spiced directly
spiced --metrics 127.0.0.1:9000

To upgrade, change:

json_path: my.json.path

To:

json_pointer: /my/json/pointer
  • Data Connector Configuration: Consistent connector name prefixing has been applied to connector specific params parameters. Prefixed parameter names helps ensure parameters do not collide.

For example, the Databricks data connector specific params are now prefixed with databricks:

datasets:
- from: databricks:spiceai.datasets.my_awesome_table # A reference to a table in the Databricks unity catalog
name: my_delta_lake_table
params:
mode: spark_connect
endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com
token: MY_TOKEN

To upgrade:

datasets:
# Example for Spark Connect
- from: databricks:spiceai.datasets.my_awesome_table # A reference to a table in the Databricks unity catalog
name: my_delta_lake_table
params:
mode: spark_connect
databricks_endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com # Now prefixed with databricks
databricks_token: ${secrets:my_token} # Now prefixed with databricks

Refer to the Data Connector documentation for parameter naming changes in this release.

Clickhouse Data Connector: The clickhouse_connection_timeout parameter has been renamed to connection_timeout as it applies to the client and is not Clickhouse configuration itself.

To upgrade, change:

clickhouse_connection_timeout: time

To:

connection_timeout: time

Contributors​

  • @y-f-u
  • @phillipleblanc
  • @ewgenius
  • @github-actions
  • @sgrebnov
  • @lukekim
  • @digadeesh
  • @peasee
  • @Sevenannn

What's Changed​

Dependencies​

No major dependency updates.

Commits​

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.2-alpha...v0.16.0-alpha

Resources​

Community​

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.