Skip to main content

9 posts tagged with "data connector"

View All Tags

Spice v1.0-rc.2 (Dec 16, 2024)

ยท 8 min read
William Croxson
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.0-rc.2 ๐Ÿ”—

Spice v1.0.0-rc.2 is the second release candidate for the first major version of Spice.ai OSS. This release continues to build on the stability of Spice for production use, including key Data Connector graduations, bug fixes, and AI features.

Highlights in v1.0-rc.2โ€‹

  • MS SQL and File Data Connectors: Graduated from Alpha to Beta.

  • GraphQL and Databricks Delta Lake Data Connectors: Graduated from Beta to Release Candidate.

  • gospice SDK Release: The Spice Go SDK has updated to v7.0, adding support for refreshing datasets and upgrading dependencies.

  • Azure AI Support: Added support for both LLMs and embedding models. Example spicepod.yml configuration:

embeddings:
- name: azure
from: azure:text-embedding-3-small
params:
endpoint: https://your-resource-name.openai.azure.com
azure_api_version: 2024-08-01-preview
azure_deployment_name: text-embedding-3-small
azure_api_key: ${ secrets:SPICE_AZURE_API_KEY }
models:
- name: azure
from: azure:gpt-4o-mini
params:
endpoint: https://your-resource-name.openai.azure.com
azure_api_version: 2024-08-01-preview
azure_deployment_name: gpt-4o-mini
azure_api_key: ${ secrets:SPICE_AZURE_TOKEN }

Accelerate subsets of columns: Spice now supports acceleration for specific columns from a federated source. Specify the desired columns directly in the Refresh SQL for more selective and efficient data acceleration.

Example spicepod.yaml configuration:

datasets:
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
params:
file_format: parquet
acceleration:
refresh_sql: SELECT tpep_pickup_datetime, tpep_dropoff_datetime, trip_distance, total_amount FROM taxi_trips

Breaking changesโ€‹

Sharepoint Authentication Parameters: now use access tokens instead of authorization codes, using the sharepoint_bearer_token parameter. The sharepoint_auth_code parameter has been removed.

Data Connector Delimiters: now support / and ://, in addition to : in the from parameter of the dataset configuration. The following examples are equivalent:

  • from: postgres://my_postgres_table
  • from: postgres/my_postgres_table
  • from: postgres:my_postgres_table

Some data connectors, such as s3 which only accepts ://, place further restrictions on the allowed delimiter.

The file data connector has changed how it interprets the :// delimiter to reflect how most other URL parsers work, i.e. file://my_file_path. Previously, the file path was interpreted as /my_file_path. Now, it is interpreted as a relative path, i.e. my_file_path.

Spice Search limit: is now applied to the final search result, instead of previously being applied separately to each dataset involved in a search before aggregation.

Dependenciesโ€‹

  • Rust: Upgraded to 1.83

Contributorsโ€‹

  • @phillipleblanc
  • @ewgenius
  • @Jeadie
  • @sgrebnov
  • @peasee
  • @Sevenannn
  • @Advayp

New Contributorsโ€‹

What's Changedโ€‹

- Fix install scripts to handle the RC release by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3718>
- Update helm chart to v1.0.0-rc.1 by @ewgenius in <https://github.com/spiceai/spiceai/pull/3720>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/3719>
- Add logic to ignore task cancellations due to runtime shutdown by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3717>
- Update to next relese version v1.0.0-rc.2 by @ewgenius in <https://github.com/spiceai/spiceai/pull/3721>
- Handle parsing OTel KeyValues from the `baggage` header by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3722>
- Update `llms` dependencies: `mistralrs`, `async-openai` by @Jeadie in <https://github.com/spiceai/spiceai/pull/3725>
- Support `jsonl` for object store by @Jeadie in <https://github.com/spiceai/spiceai/pull/3726>
- Fix NSQL models integration tests for HF by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3727>
- standardise 'csv_schema_infer_max_records' -> 'schema_infer_max_records'; include deprecation messages for dataset params by @Jeadie in <https://github.com/spiceai/spiceai/pull/3732>
- feat: Add script to generate TPC-H data for file connector by @peasee in <https://github.com/spiceai/spiceai/pull/3737>
- feat: Add file connector integration test by @peasee in <https://github.com/spiceai/spiceai/pull/3735>
- fix: Add explicit message for ODBC connector when not installed by @peasee in <https://github.com/spiceai/spiceai/pull/3736>
- Remove Box::leak in `create_accelerated_table` by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3739>
- docs: Update enhancement and PR template by @peasee in <https://github.com/spiceai/spiceai/pull/3740>
- feat: add file connector benchmark by @peasee in <https://github.com/spiceai/spiceai/pull/3734>
- docs: Release file connector beta by @peasee in <https://github.com/spiceai/spiceai/pull/3738>
- For embeddings, use `sentence_*_config.json`, download HF async, use TEI functions by @Jeadie in <https://github.com/spiceai/spiceai/pull/3724>
- Optimize build & release workflow for trunk builds by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3741>
- Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/3752>
- Skip Spice cloud integration tests by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3755>
- Add `http_requests` metric and deprecate `http_requests_total` by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3748>
- Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/3759>
- fix: Parquet file generation script by @peasee in <https://github.com/spiceai/spiceai/pull/3762>
- fix: Use InvalidConfiguration error for GraphQL query errors by @peasee in <https://github.com/spiceai/spiceai/pull/3763>
- Extend Spice Search integration and E2E tests to cover chunking by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3750>
- test: Add GraphQL integration tests from external sources by @peasee in <https://github.com/spiceai/spiceai/pull/3756>
- docs: Release GraphQL release candidate by @peasee in <https://github.com/spiceai/spiceai/pull/3764>
- Accelerate a subset of columns from source dataset in Refresh SQL by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3765>
- Run TPCDS benchmark for databricks delta mode by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3751>
- Update dependencies by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3747>
- Implement vector search benchmark initialization by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3774>
- Implement InvalidTypeAction for PostgreSQL Data Connector by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3767>
- fix: Check ODBC parameters are positive integers by @peasee in <https://github.com/spiceai/spiceai/pull/3777>
- Fix Delta DataType `Map` type mapping to arrow type by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3776>
- Update Databricks & Delta Lake Connector RC criteria by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3778>
- Add a `/v1/packages/generate` API to generate a Spicepod package from a GitHub repo. by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3782>
- Set `Spice-Target-Source` header for `spice add` by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3783>
- Call v1 spicerack API by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3784>
- Run models integration tests on self-hosted macOS runners by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3785>
- Fix OpenAI models integration tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3786>
- Integration test for Databricks delta_lake mode by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3779>
- Add `spice connect` for connecting to existing Spice.ai instances by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3790>
- Add `eval` spicepod component; basic HTTP api to run eval. by @Jeadie in <https://github.com/spiceai/spiceai/pull/3766>
- Release RC for databricks delta_lake mode by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3792>
- Include Huggingface model to E2E models tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3788>
- Enable `trace_id` & `parent_span_id` overrides for `v1/chat/completion` by @Jeadie in <https://github.com/spiceai/spiceai/pull/3791>
- Search benchmark: run search workload and measure result by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3793>
- Search benchmark: measure search precision by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3804>
- Use MinIO instead of S3 for benchmark tests by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3794>
- Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/3814>
- Only verify TPCH / TPCDS official query results for DuckDB by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3816>
- Fixes for the Debezium connector by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3819>
- Fix insert statement when all columns are constraint columns by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3820>
- docs: Move ODBC to Beta for current state of roadmap by @peasee in <https://github.com/spiceai/spiceai/pull/3823>
- Accept `:`, `/` or `://` as the delimiter for the data connector by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3821>
- Update dependencies by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3826>
- Enable `read_write` mode support for Postgres Data Connector by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3813>
- feat: add Databricks ODBC TPCDS benchmark by @peasee in <https://github.com/spiceai/spiceai/pull/3825>
- Change `spice.ai` data connector dataset path format to `<org>/<app>/datasets/<table_reference>` by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3828>
- fix: enable tpcds explain snapshotting by @peasee in <https://github.com/spiceai/spiceai/pull/3830>
- Azure AI support for both LLMs & embedding models by @Jeadie in <https://github.com/spiceai/spiceai/pull/3824>
- Add Github Workflow to run Search Benchmark by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3834>
- Fetch access token with Microsoft OAuth, and use access token to initiate Sharepoint data connector graph client by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3836>
- Initialize accelerator for datasets dynamically included by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3714>
- Update cargo.lock by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3838>
- feat: add MS SQL TPCH benchmark by @peasee in <https://github.com/spiceai/spiceai/pull/3833>
- Improve Azure AI models support by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3835>
- Primary key support for Arrow's `Memtable` by @Jeadie in <https://github.com/spiceai/spiceai/pull/3829>
- Update Tokenizer to 0.21 and mistral.rs by @Jeadie in <https://github.com/spiceai/spiceai/pull/3839>
- Fix models integration tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3843>
- Enable `spice login abfs` by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3844>
- update `crates/llms` dependencies to 'spiceai' branch by @Jeadie in <https://github.com/spiceai/spiceai/pull/3846>
- Make eval runs non-blocking; `spice.eval.{results, runs}` tables. by @Jeadie in <https://github.com/spiceai/spiceai/pull/3780>
- fix: Update GraphQL snapshots by @peasee in <https://github.com/spiceai/spiceai/pull/3849>
- Update to Rust 1.83 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3847>
- feat: add mssql integration test by @peasee in <https://github.com/spiceai/spiceai/pull/3848>
- Prepend user-specified user agent in flight repl by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3850>
- fix: trim CHAR in mssql by @peasee in <https://github.com/spiceai/spiceai/pull/3852>
- Fix column quoting for SpiceCloudPlatform dialect by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3857>
- Optimize builds by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3861>
- Endgame template: Add recently added AI/ML quickstarts and samples by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3859>
- docs: Release MS SQL Beta by @peasee in <https://github.com/spiceai/spiceai/pull/3853>
- Fix nsql sampling for tables with embeddings by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3860>
- Make GH workflows with spiceai-macos runners more stable by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3863>
- fix: Remove GraphQL swapi test by @peasee in <https://github.com/spiceai/spiceai/pull/3867>
- create 1 `tokio::test` per test/model by @Jeadie in <https://github.com/spiceai/spiceai/pull/3696>
- handle `max_completion_tokens` vs `max_tokens` for openai vs azure by @Jeadie in <https://github.com/spiceai/spiceai/pull/3869>
- Search benchmark: write results to dataset by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3871>
- Create `evalconverter` that creates spice eval components. by @Jeadie in <https://github.com/spiceai/spiceai/pull/3864>
- Update quickstart in README.md by @ewgenius in <https://github.com/spiceai/spiceai/pull/3876>
- Remove reference to spiceai-smart-demo from the repo home by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3885>
- Trace `evals` accelerated tables updates in debug mode by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3884>
- Clarify confusing log message by @Advayp in <https://github.com/spiceai/spiceai/pull/3862>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/3840>
- Azure OpenAI models: make `endpoint` parameter required by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3883>
- Use spiceai delta kernel fork, actionable message for delta checkpoint errors by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3856>
- Add support for GGUF files in HF by @Jeadie in <https://github.com/spiceai/spiceai/pull/3875>

**Full Changelog**: <https://github.com/spiceai/spiceai/compare/v1.0.0-rc.1...v1.0.0-rc.2>
```text

## Resources

- [Getting started with Spice.ai](https://docs.spiceai.org/getting-started/)
- [Documentation](https://docs.spiceai.org/)

## Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

- Twitter: [@spice_ai](https://twitter.com/spice_ai)
- Discord: [https://discord.gg/kZnTfneP5u](https://discord.gg/kZnTfneP5u)
- Telegram: [Spice AI Discussion](https://t.me/spiceaichat)
- Reddit: [https://www.reddit.com/r/spiceai](https://www.reddit.com/r/spiceai)
- Email: [[email protected]](mailto:[email protected])

Spice v0.19.3-beta (Oct 28, 2024)

ยท 4 min read
Luke Kim
Founder and CEO of Spice AI

Announcing the release of Spice v0.19.3-beta ๐Ÿ“ˆ

Spice v0.19.3-beta improves the performance and stability of data connectors and accelerators, including faster queries across multiple federated sources by optimizing how filters are applied. Anthropic has also been added as a LLM model provider.

Highlights in v0.19.3โ€‹

DataFusion Fixes: Resolved bugs in DataFusion and DataFusion Table Providers, expanding TPC-DS coverage and correctness.

GitHub Data Connector Beta Milestone: The GitHub Data Connector has graduated to Beta after extensive testing, stability, and performance improvements.

Anthropic Models Provider: Anthropic has been added as an LLM provider, including support for streaming.

Example spicepod.yml:

models:
- from: anthropic:claude-3-5-sonnet-20240620
name: claude_3_5_sonnet
params:
anthropic_api_key: ${ secrets:SPICE_ANTHROPIC_API_KEY }

Breaking changesโ€‹

None.

Contributorsโ€‹

  • @Jeadie
  • @Sevenannn
  • @phillipleblanc
  • @peasee
  • @sgrebnov
  • @nlamirault
  • @barracudarin
  • @lukekim
  • @slyons

New Contributorsโ€‹

What's Changedโ€‹

- Make Anthropic OpenAI compatible. by @Jeadie in https://github.com/spiceai/spiceai/pull/3087
- Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/3200
- Bump version to 1.0.0-rc.1 by @Sevenannn in https://github.com/spiceai/spiceai/pull/3202
- Fix clickhouse schema inference for non-default database by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3201
- Update endgame template by @Sevenannn in https://github.com/spiceai/spiceai/pull/3198
- Upgrade dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3197
- fix: dataset refresh defaults properties to None by @peasee in https://github.com/spiceai/spiceai/pull/3205
- Upgrade OTEL to v0.26 and make seconds based metrics reported precisely by @sgrebnov in https://github.com/spiceai/spiceai/pull/3203
- use `text_embedding_inference::Infer` for more complete embedding solution by @Jeadie in https://github.com/spiceai/spiceai/pull/3199
- Add S3 parquet file - arrow accelerator e2e test by @Sevenannn in https://github.com/spiceai/spiceai/pull/3154
- feat: Add script to setup clickbench on mysql by @peasee in https://github.com/spiceai/spiceai/pull/3176
- Update helm chart version to v0.19.2 by @Sevenannn in https://github.com/spiceai/spiceai/pull/3210
- Add sample dataset option in `v1/nsql`. by @Jeadie in https://github.com/spiceai/spiceai/pull/3105
- Split spiced_docker build across architectures by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3206
- feat(helm): do not install demo dataset by default by @nlamirault in https://github.com/spiceai/spiceai/pull/3207
- Split integration test across build/run steps by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3215
- feat(helm): Refactoring Kubernetes labels by @nlamirault in https://github.com/spiceai/spiceai/pull/3208
- Define 'tool_recursion_limit' for LLMs, and limit internal tool calling recursion. by @Jeadie in https://github.com/spiceai/spiceai/pull/3214
- Improve filters pushdown for federated queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/3183
- Implement native schema inference for PostgreSQL by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3209
- docs: Update release criteria by @peasee in https://github.com/spiceai/spiceai/pull/3219
- Run SQLite acceleration TPC-DS tests using smaller scale by @sgrebnov in https://github.com/spiceai/spiceai/pull/3227
- bind the serviceAccount if a name is given or if we're creating one by @barracudarin in https://github.com/spiceai/spiceai/pull/3228
- Only emit channel send error log when its not a closed channel error by @Jeadie in https://github.com/spiceai/spiceai/pull/3230
- Enable Parquet Exec filter pushdown in Spice by @Sevenannn in https://github.com/spiceai/spiceai/pull/3216
- Add snapshots for SQLite TPC-DS benchmark (file mode) by @sgrebnov in https://github.com/spiceai/spiceai/pull/3234
- docs: Add SDK release checks to endgame by @peasee in https://github.com/spiceai/spiceai/pull/3256
- Implement `localpod` Data Connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3249
- Revert "Enable Parquet Exec filter pushdown in Spice (#3216)" by @Sevenannn in https://github.com/spiceai/spiceai/pull/3244
- refactor: Use existing action for detecting changes by @peasee in https://github.com/spiceai/spiceai/pull/3255
- feat: Add GitHub integration test by @peasee in https://github.com/spiceai/spiceai/pull/3226
- Add get_readiness tool to retrieve status of all registered components by @lukekim in https://github.com/spiceai/spiceai/pull/3035
- Improve CLI error output when REPL can't connect to the Flight endpoint by @slyons in https://github.com/spiceai/spiceai/pull/3188
- Fixing FTP link in Endgame by @slyons in https://github.com/spiceai/spiceai/pull/3267
- Update version to 0.19.3-beta by @sgrebnov in https://github.com/spiceai/spiceai/pull/3269
- add service type and annotation customizations in https://github.com/spiceai/spiceai/pull/3268

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v0.19.2-beta...v0.19.3-beta

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.19.2-beta (Oct 21, 2024)

ยท 4 min read
Luke Kim
Founder and CEO of Spice AI

Announcing the release of Spice v0.19.2-beta โšก

Spice v0.19.2-beta continues to improve performance and stability of data connectors and data accelerators, further expands TPC-DS coverage, and includes several bug fixes.

Highlights in v0.19.2โ€‹

DataFusion Fixes: Resolved bugs in DataFusion and DataFusion Table Providers, improving TPC-DS query support and correctness.

TPC-DS Snapshots: Extended support for TPC-DS benchmarks with added snapshot tests for validating query plans and result accuracy.

PostgreSQL Accelerator Beta: Postgres Data Accelerator has been promoted to Beta Quality

Breaking changesโ€‹

  • The hive_infer_partitions parameter been changed to hive_partitioning_enabled, now defaults to false and must be explicitly enabled.

Contributorsโ€‹

  • @ewgenius
  • @sgrebnov
  • @slyons
  • @Jeadie
  • @Sevenannn
  • @phillipleblanc
  • @dependabot
  • @peasee

Dependenciesโ€‹

What's Changedโ€‹

- Update Helm chart for v0.19.1-beta by @ewgenius in https://github.com/spiceai/spiceai/pull/3106
- Add more TPC-DS snapshots for Postgres acceleration by @sgrebnov in https://github.com/spiceai/spiceai/pull/3107
- Bumping version to 1.0.0-rc.1 by @slyons in https://github.com/spiceai/spiceai/pull/3109
- New table sampling methods: sample_distinct_columns, random_sample, top_n_sample by @Jeadie in https://github.com/spiceai/spiceai/pull/3108
- Add TPCDS snapshot tests for file-based and in-mem duckdb by @Sevenannn in https://github.com/spiceai/spiceai/pull/3115
- Add Postgres acceleration E2E test for MySQL by @sgrebnov in https://github.com/spiceai/spiceai/pull/3110
- Update datafusion logical plan to avoid wrong group_by columns in aggregation by @Sevenannn in https://github.com/spiceai/spiceai/pull/3111
- Warn if user tries to embed column that does not exist by @Jeadie in https://github.com/spiceai/spiceai/pull/3120
- Changes for Rust version upgrade by @Sevenannn in https://github.com/spiceai/spiceai/pull/3134
- Add `unnest` support for federated plans by @sgrebnov in https://github.com/spiceai/spiceai/pull/3133
- Don't `.clone()` unnecessarily by @Jeadie in https://github.com/spiceai/spiceai/pull/3128
- Fix Flight `get_schema` to construct logical plan and return that schema. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3131
- Bump clap from 4.5.19 to 4.5.20 by @dependabot in https://github.com/spiceai/spiceai/pull/3099
- Add GitHub Workflow to build `spice-postgres-tpcds-bench` image by @sgrebnov in https://github.com/spiceai/spiceai/pull/3140
- test: Add basic MySQL integration test by @peasee in https://github.com/spiceai/spiceai/pull/3143
- Bump datafusion-federation and datafusion-table-providers crates by @sgrebnov in https://github.com/spiceai/spiceai/pull/3148
- docs: Add MySQL limitation for division by zero by @peasee in https://github.com/spiceai/spiceai/pull/3144
- fix: Dataset refresh by @peasee in https://github.com/spiceai/spiceai/pull/3147
- Update arrow, duckdb, postgres accelerator tpcds snapshots by @Sevenannn in https://github.com/spiceai/spiceai/pull/3145
- Add TPC-DS benchmarks for Postgres data connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/3149
- Update E2E test ci to include tests for accelerating Postgres into accelerators by @Sevenannn in https://github.com/spiceai/spiceai/pull/3137
- Add TPCDS Benchmark test and snapshots for S3 by @Sevenannn in https://github.com/spiceai/spiceai/pull/3152
- [cli] Include 200 in acceptable response codes for `doRuntimeApiRequest` by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3157
- Use `-build.{GIT_SHA}` for unreleased versions by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3159
- Upgrade to Rust 1.82 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3158
- Disable `hive_infer_partitions` by default by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3160
- Upgrade to DuckDB 1.1.1 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3161
- feat: Add MySQL TPCDS results snapshots and exclude workarounds by @peasee in https://github.com/spiceai/spiceai/pull/3165
- Fix task_history output for sql, add output to table_schema & list_datasets tool by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3166
- feat: Add ClickBench queries as separate files by @peasee in https://github.com/spiceai/spiceai/pull/3169
- Calculate embeddings in a separate blocking thread by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3170
- docs: Update ROADMAP.md and release criterias by @peasee in https://github.com/spiceai/spiceai/pull/3124
- Handle OpenTelemetry errors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3173
- Update version to 0.19.2-beta by @Sevenannn in https://github.com/spiceai/spiceai/pull/3182

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v0.19.1-beta...v0.19.2-beta

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.19.1-beta (Oct 14, 2024)

ยท 4 min read
Luke Kim
Founder and CEO of Spice AI

Announcing the release of Spice v0.19.1-beta ๐Ÿ”ฅ

Spice v0.19.1 brings further performance and stability improvements to data connectors, including improved query push-down for file-based connectors (s3, abfs, file, ftp, sftp) that use Hive-style partitioning.

Highlights in v0.19.1โ€‹

TPC-H and TPC-DS Coverage: Expanded coverage for TPC-H and TPC-DS benchmarking suites across accelerators and connectors.

GitHub Connector Array Filter: The GitHub connector now supports filter push down for the array_contains function in SQL queries using search query mode.

NSQL CLI Command: A new spice nsql CLI command has been added to easily query datasets with natural language from the command line.

Breaking changesโ€‹

None

Contributorsโ€‹

  • @peasee
  • @Sevenannn
  • @sgrebnov
  • @karifabri
  • @phillipleblanc
  • @lukekim
  • @Jeadie
  • @slyons

Dependenciesโ€‹

What's Changedโ€‹

- release: Update helm chart for v0.19.0-beta by @peasee in https://github.com/spiceai/spiceai/pull/3024
- Set fail-fast = true for benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2997
- release: Update next version and ROADMAP by @peasee in https://github.com/spiceai/spiceai/pull/3033
- Verify TPCH benchmark query results for Spark connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2993
- feat: Add x-spice-user-agent header to Spice REPL by @peasee in https://github.com/spiceai/spiceai/pull/2979
- Update to object store file formats documentation link by @karifabri in https://github.com/spiceai/spiceai/pull/3036
- Use teraswitch-runners for Linux x64 workflows + builds by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3042
- feat: Support array contains in GitHub pushdown by @peasee in https://github.com/spiceai/spiceai/pull/2983
- Bump text-splitter from 0.16.1 to 0.17.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2987
- Revert integration tests back to hosted runner by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3046
- Tune Github runner resources to allow in memory TPCDS benchmark to run by @Sevenannn in https://github.com/spiceai/spiceai/pull/3025
- fix: add winver by @peasee in https://github.com/spiceai/spiceai/pull/3054
- refactor: Use is modifier for checking GitHub state filter by @peasee in https://github.com/spiceai/spiceai/pull/3056
- Enable `merge_group` checks for PR workflows by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3058
- Fix issues with merge group by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3059
- Validate in-memory arrow accelertion TPCDS result correctness by @Sevenannn in https://github.com/spiceai/spiceai/pull/3044
- Fix rev parsing for PR checks by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3060
- Use 'Accept' header for `/v1/sql/` and `/v1/nsql` by @Jeadie in https://github.com/spiceai/spiceai/pull/3032
- Verify Postgres acceleration TPCDS result correctness by @Sevenannn in https://github.com/spiceai/spiceai/pull/3043
- Add NSQL CLI REPL command by @lukekim in https://github.com/spiceai/spiceai/pull/2856
- Preserve query results order and add TPCH benchmark results verification for duckdb:file mode by @sgrebnov in https://github.com/spiceai/spiceai/pull/3034
- Refactor benchmark to include MySQL tpcds bench, tweaks to makefile target for generating mysql tpcds data by @Sevenannn in https://github.com/spiceai/spiceai/pull/2967
- Support runtime parameter for `sql_query_keep_partition_by_columns` & enable by default by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3065
- Document TPC-DS limitations: `EXCEPT`, `INTERSECT`, duplicate names by @sgrebnov in https://github.com/spiceai/spiceai/pull/3069
- Adding ABFS benchmark by @slyons in https://github.com/spiceai/spiceai/pull/3062
- Add support for GitHub app installation auth for GitHub connector by @ewgenius in https://github.com/spiceai/spiceai/pull/3063
- docs: Document stack overflow workaround, add helper script by @peasee in https://github.com/spiceai/spiceai/pull/3070
- Tune MySQL TPCDS image to allow for successful benchmark test run by @Sevenannn in https://github.com/spiceai/spiceai/pull/3067
- Automatically infer partitions for hive-style partitioned files for object store based connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3073
- Support `hf_token` from params/secrets by @Jeadie in https://github.com/spiceai/spiceai/pull/3071
- Inherit embedding columns from source, when available. by @Jeadie in https://github.com/spiceai/spiceai/pull/3045
- Validate identifiers for component names by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3079
- docs: Add workaround for TPC-DS Q97 in MySQL by @peasee in https://github.com/spiceai/spiceai/pull/3080
- Document TPC-DS Postgres column alias in a CASE statement limitation by @sgrebnov in https://github.com/spiceai/spiceai/pull/3083
- Update plan snapshots for TPC-H bench queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/3088
- Update Datafusion crate to include recent unparsing fixes by @sgrebnov in https://github.com/spiceai/spiceai/pull/3089
- Sample SQL table data tool and API by @Jeadie in https://github.com/spiceai/spiceai/pull/3081
- chore: Update datafusion-table-providers by @peasee in https://github.com/spiceai/spiceai/pull/3090
- Add `hive_infer_partitions` to remaining object store connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3086
- deps: Update datafusion-table-providers by @peasee in https://github.com/spiceai/spiceai/pull/3093
- For local embedding models, return usage input tokens. by @Jeadie in https://github.com/spiceai/spiceai/pull/3095
- Update end_game.md with Accelerator/Connector criteria check by @slyons in https://github.com/spiceai/spiceai/pull/3092
- Update TPC-DS Q90 by @sgrebnov in https://github.com/spiceai/spiceai/pull/3094
- docs: Add RC connector criteria by @peasee in https://github.com/spiceai/spiceai/pull/3026
- Update version to 0.19.1-beta by @sgrebnov in https://github.com/spiceai/spiceai/pull/3101

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v0.19.0-beta...v0.19.1-beta

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.18.3-beta (Sep 30, 2024)

ยท 4 min read
Jack Eadie
Token Plumber at Spice AI

Announcing the release of Spice v0.18.3-beta ๐Ÿ› ๏ธ

The Spice v0.18.3-beta release includes several quality-of-life improvements including verbosity flags for spiced and the Spice CLI, vector search over larger documents with support for chunking dataset embeddings, and multiple performance enhancements. Additionally, the release includes several bug fixes, dependency updates, and optimizations, including updated table providers and significantly improved GitHub data connector performance for issues and pull requests.

Highlights in v0.18.3-betaโ€‹

GitHub Query Mode: A new github_query_mode: search parameter has been added to the GitHub Data Connector, which uses the GitHub Search API to enable faster and more efficient query of issues and pull requests when using filters.

Example spicepod.yml:

- from: github:github.com/spiceai/spiceai/issues/trunk
name: spiceai.issues
params:
github_query_mode: search # Use GitHub Search API
github_token: ${secrets:GITHUB_TOKEN}

Output Verbosity: Higher verbosity output levels can be specified through flags for both spiced and the Spice CLI.

Example command line:

spice -v
spice --very-verbose

spiced -vv
spiced --verbose

Embedding Chunking: Chunking can be enabled and configured to preprocess input data before generating dataset embeddings. This improves the relevance and precision for larger pieces of content.

Example spicepod.yml:

- name: support_tickets
embeddings:
- column: conversation_history
use: openai_embeddings
chunking:
enabled: true
target_chunk_size: 128
overlap_size: 16
trim_whitespace: true

For details, see the Search Documentation.

Dependenciesโ€‹

Contributorsโ€‹

  • @Sevenannn
  • @peasee
  • @Jeadie
  • @sgrebnov
  • @phillipleblanc
  • @ewgenius
  • @slyons

What's Changedโ€‹

- Update datafusion table provider patch by @Sevenannn in https://github.com/spiceai/spiceai/pull/2817
- refactor: Set max_rows_per_batch for ODBC to 4000 by @peasee in https://github.com/spiceai/spiceai/pull/2822
- Use User message for health check by @Jeadie in https://github.com/spiceai/spiceai/pull/2823
- Upgrade Helm chart (Spice v0.18.2-beta) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2820
- Add verbosity flags for spiced, spice: `-v`, `-vv`, `--verbose`, `--very-verbose`. by @Jeadie in https://github.com/spiceai/spiceai/pull/2831
- Rename `spiceai` data connector to `spice.ai` by @sgrebnov in https://github.com/spiceai/spiceai/pull/2680
- Prepare for v0.19.0-beta release (version bump) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2821
- Bump clap from 4.5.17 to 4.5.18 (#2801) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2848
- Enable "rc" feature for serde in spicepod crate by @ewgenius in https://github.com/spiceai/spiceai/pull/2851
- Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2852
- chore: update table providers by @peasee in https://github.com/spiceai/spiceai/pull/2858
- fix: Use GitHub search for issues in GraphQL by @peasee in https://github.com/spiceai/spiceai/pull/2845
- fix: Use GitHub search for pull_requests by @peasee in https://github.com/spiceai/spiceai/pull/2847
- Support chunking dataset embeddings by @Jeadie in https://github.com/spiceai/spiceai/pull/2854
- refactor: Update GraphQL client to be more robust for filter push down by @peasee in https://github.com/spiceai/spiceai/pull/2864
- docs: Update accelerator beta criteria by @peasee in https://github.com/spiceai/spiceai/pull/2865
- Change `BytesProcessedRule` to be an optimizer rather than an analyzer rule by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2867
- Don't run E2E or PR tests on documentation by @Jeadie in https://github.com/spiceai/spiceai/pull/2869
- Verify benchmark query results using snapshot testing (spice.ai connector) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2866
- feat: Add GraphQLOptimizer by @peasee in https://github.com/spiceai/spiceai/pull/2868
- Update quickstarts for Endgame by @Jeadie in https://github.com/spiceai/spiceai/pull/2863
- Update version to v0.18.3-beta by @sgrebnov in https://github.com/spiceai/spiceai/pull/2882
- Update DataFusion: fix coalesce, Aggregation with Window functions unparsing support by @sgrebnov in https://github.com/spiceai/spiceai/pull/2884
- Revert "Rename `spiceai` data connector to `spice.ai`" by @sgrebnov in https://github.com/spiceai/spiceai/pull/2881
- Adding integration test for DuckDB read functions by @slyons in https://github.com/spiceai/spiceai/pull/2857
- Show more informative mysql error message by @Sevenannn in https://github.com/spiceai/spiceai/pull/2883
- Fix `no process-level CryptoProvider available` when using REPL and TLS by @sgrebnov in https://github.com/spiceai/spiceai/pull/2887
- Change UX for chunking and enable overlap_size in chunking by @Jeadie in https://github.com/spiceai/spiceai/pull/2890
- Add `log/slog` to spice CLI tool by @Jeadie in https://github.com/spiceai/spiceai/pull/2859
- feat: Add GitHub GraphQLOptimizer by @peasee in https://github.com/spiceai/spiceai/pull/2870
- Fix mysql invalid tablename error message by @Sevenannn in https://github.com/spiceai/spiceai/pull/2896
- fix: Remove login column rename in pulls and update Optimizer by @peasee in https://github.com/spiceai/spiceai/pull/2897
- Fix require check checking. by @Jeadie in https://github.com/spiceai/spiceai/pull/2898

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v0.18.2-beta...v0.18.3-beta

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.