Skip to main content

5 posts tagged with "databricks"

View All Tags

Spice v1.0-rc.2 (Dec 16, 2024)

ยท 8 min read
William Croxson
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.0-rc.2 ๐Ÿ”—

Spice v1.0.0-rc.2 is the second release candidate for the first major version of Spice.ai OSS. This release continues to build on the stability of Spice for production use, including key Data Connector graduations, bug fixes, and AI features.

Highlights in v1.0-rc.2โ€‹

  • MS SQL and File Data Connectors: Graduated from Alpha to Beta.

  • GraphQL and Databricks Delta Lake Data Connectors: Graduated from Beta to Release Candidate.

  • gospice SDK Release: The Spice Go SDK has updated to v7.0, adding support for refreshing datasets and upgrading dependencies.

  • Azure AI Support: Added support for both LLMs and embedding models. Example spicepod.yml configuration:

embeddings:
- name: azure
from: azure:text-embedding-3-small
params:
endpoint: https://your-resource-name.openai.azure.com
azure_api_version: 2024-08-01-preview
azure_deployment_name: text-embedding-3-small
azure_api_key: ${ secrets:SPICE_AZURE_API_KEY }
models:
- name: azure
from: azure:gpt-4o-mini
params:
endpoint: https://your-resource-name.openai.azure.com
azure_api_version: 2024-08-01-preview
azure_deployment_name: gpt-4o-mini
azure_api_key: ${ secrets:SPICE_AZURE_TOKEN }

Accelerate subsets of columns: Spice now supports acceleration for specific columns from a federated source. Specify the desired columns directly in the Refresh SQL for more selective and efficient data acceleration.

Example spicepod.yaml configuration:

datasets:
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
params:
file_format: parquet
acceleration:
refresh_sql: SELECT tpep_pickup_datetime, tpep_dropoff_datetime, trip_distance, total_amount FROM taxi_trips

Breaking changesโ€‹

Sharepoint Authentication Parameters: now use access tokens instead of authorization codes, using the sharepoint_bearer_token parameter. The sharepoint_auth_code parameter has been removed.

Data Connector Delimiters: now support / and ://, in addition to : in the from parameter of the dataset configuration. The following examples are equivalent:

  • from: postgres://my_postgres_table
  • from: postgres/my_postgres_table
  • from: postgres:my_postgres_table

Some data connectors, such as s3 which only accepts ://, place further restrictions on the allowed delimiter.

The file data connector has changed how it interprets the :// delimiter to reflect how most other URL parsers work, i.e. file://my_file_path. Previously, the file path was interpreted as /my_file_path. Now, it is interpreted as a relative path, i.e. my_file_path.

Spice Search limit: is now applied to the final search result, instead of previously being applied separately to each dataset involved in a search before aggregation.

Dependenciesโ€‹

  • Rust: Upgraded to 1.83

Contributorsโ€‹

  • @phillipleblanc
  • @ewgenius
  • @Jeadie
  • @sgrebnov
  • @peasee
  • @Sevenannn
  • @Advayp

New Contributorsโ€‹

What's Changedโ€‹

- Fix install scripts to handle the RC release by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3718>
- Update helm chart to v1.0.0-rc.1 by @ewgenius in <https://github.com/spiceai/spiceai/pull/3720>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/3719>
- Add logic to ignore task cancellations due to runtime shutdown by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3717>
- Update to next relese version v1.0.0-rc.2 by @ewgenius in <https://github.com/spiceai/spiceai/pull/3721>
- Handle parsing OTel KeyValues from the `baggage` header by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3722>
- Update `llms` dependencies: `mistralrs`, `async-openai` by @Jeadie in <https://github.com/spiceai/spiceai/pull/3725>
- Support `jsonl` for object store by @Jeadie in <https://github.com/spiceai/spiceai/pull/3726>
- Fix NSQL models integration tests for HF by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3727>
- standardise 'csv_schema_infer_max_records' -> 'schema_infer_max_records'; include deprecation messages for dataset params by @Jeadie in <https://github.com/spiceai/spiceai/pull/3732>
- feat: Add script to generate TPC-H data for file connector by @peasee in <https://github.com/spiceai/spiceai/pull/3737>
- feat: Add file connector integration test by @peasee in <https://github.com/spiceai/spiceai/pull/3735>
- fix: Add explicit message for ODBC connector when not installed by @peasee in <https://github.com/spiceai/spiceai/pull/3736>
- Remove Box::leak in `create_accelerated_table` by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3739>
- docs: Update enhancement and PR template by @peasee in <https://github.com/spiceai/spiceai/pull/3740>
- feat: add file connector benchmark by @peasee in <https://github.com/spiceai/spiceai/pull/3734>
- docs: Release file connector beta by @peasee in <https://github.com/spiceai/spiceai/pull/3738>
- For embeddings, use `sentence_*_config.json`, download HF async, use TEI functions by @Jeadie in <https://github.com/spiceai/spiceai/pull/3724>
- Optimize build & release workflow for trunk builds by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3741>
- Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/3752>
- Skip Spice cloud integration tests by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3755>
- Add `http_requests` metric and deprecate `http_requests_total` by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3748>
- Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/3759>
- fix: Parquet file generation script by @peasee in <https://github.com/spiceai/spiceai/pull/3762>
- fix: Use InvalidConfiguration error for GraphQL query errors by @peasee in <https://github.com/spiceai/spiceai/pull/3763>
- Extend Spice Search integration and E2E tests to cover chunking by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3750>
- test: Add GraphQL integration tests from external sources by @peasee in <https://github.com/spiceai/spiceai/pull/3756>
- docs: Release GraphQL release candidate by @peasee in <https://github.com/spiceai/spiceai/pull/3764>
- Accelerate a subset of columns from source dataset in Refresh SQL by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3765>
- Run TPCDS benchmark for databricks delta mode by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3751>
- Update dependencies by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3747>
- Implement vector search benchmark initialization by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3774>
- Implement InvalidTypeAction for PostgreSQL Data Connector by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3767>
- fix: Check ODBC parameters are positive integers by @peasee in <https://github.com/spiceai/spiceai/pull/3777>
- Fix Delta DataType `Map` type mapping to arrow type by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3776>
- Update Databricks & Delta Lake Connector RC criteria by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3778>
- Add a `/v1/packages/generate` API to generate a Spicepod package from a GitHub repo. by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3782>
- Set `Spice-Target-Source` header for `spice add` by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3783>
- Call v1 spicerack API by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3784>
- Run models integration tests on self-hosted macOS runners by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3785>
- Fix OpenAI models integration tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3786>
- Integration test for Databricks delta_lake mode by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3779>
- Add `spice connect` for connecting to existing Spice.ai instances by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3790>
- Add `eval` spicepod component; basic HTTP api to run eval. by @Jeadie in <https://github.com/spiceai/spiceai/pull/3766>
- Release RC for databricks delta_lake mode by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3792>
- Include Huggingface model to E2E models tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3788>
- Enable `trace_id` & `parent_span_id` overrides for `v1/chat/completion` by @Jeadie in <https://github.com/spiceai/spiceai/pull/3791>
- Search benchmark: run search workload and measure result by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3793>
- Search benchmark: measure search precision by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3804>
- Use MinIO instead of S3 for benchmark tests by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3794>
- Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/3814>
- Only verify TPCH / TPCDS official query results for DuckDB by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3816>
- Fixes for the Debezium connector by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3819>
- Fix insert statement when all columns are constraint columns by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3820>
- docs: Move ODBC to Beta for current state of roadmap by @peasee in <https://github.com/spiceai/spiceai/pull/3823>
- Accept `:`, `/` or `://` as the delimiter for the data connector by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3821>
- Update dependencies by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3826>
- Enable `read_write` mode support for Postgres Data Connector by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3813>
- feat: add Databricks ODBC TPCDS benchmark by @peasee in <https://github.com/spiceai/spiceai/pull/3825>
- Change `spice.ai` data connector dataset path format to `<org>/<app>/datasets/<table_reference>` by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3828>
- fix: enable tpcds explain snapshotting by @peasee in <https://github.com/spiceai/spiceai/pull/3830>
- Azure AI support for both LLMs & embedding models by @Jeadie in <https://github.com/spiceai/spiceai/pull/3824>
- Add Github Workflow to run Search Benchmark by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3834>
- Fetch access token with Microsoft OAuth, and use access token to initiate Sharepoint data connector graph client by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3836>
- Initialize accelerator for datasets dynamically included by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3714>
- Update cargo.lock by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3838>
- feat: add MS SQL TPCH benchmark by @peasee in <https://github.com/spiceai/spiceai/pull/3833>
- Improve Azure AI models support by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3835>
- Primary key support for Arrow's `Memtable` by @Jeadie in <https://github.com/spiceai/spiceai/pull/3829>
- Update Tokenizer to 0.21 and mistral.rs by @Jeadie in <https://github.com/spiceai/spiceai/pull/3839>
- Fix models integration tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3843>
- Enable `spice login abfs` by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3844>
- update `crates/llms` dependencies to 'spiceai' branch by @Jeadie in <https://github.com/spiceai/spiceai/pull/3846>
- Make eval runs non-blocking; `spice.eval.{results, runs}` tables. by @Jeadie in <https://github.com/spiceai/spiceai/pull/3780>
- fix: Update GraphQL snapshots by @peasee in <https://github.com/spiceai/spiceai/pull/3849>
- Update to Rust 1.83 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3847>
- feat: add mssql integration test by @peasee in <https://github.com/spiceai/spiceai/pull/3848>
- Prepend user-specified user agent in flight repl by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3850>
- fix: trim CHAR in mssql by @peasee in <https://github.com/spiceai/spiceai/pull/3852>
- Fix column quoting for SpiceCloudPlatform dialect by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3857>
- Optimize builds by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/3861>
- Endgame template: Add recently added AI/ML quickstarts and samples by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3859>
- docs: Release MS SQL Beta by @peasee in <https://github.com/spiceai/spiceai/pull/3853>
- Fix nsql sampling for tables with embeddings by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3860>
- Make GH workflows with spiceai-macos runners more stable by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3863>
- fix: Remove GraphQL swapi test by @peasee in <https://github.com/spiceai/spiceai/pull/3867>
- create 1 `tokio::test` per test/model by @Jeadie in <https://github.com/spiceai/spiceai/pull/3696>
- handle `max_completion_tokens` vs `max_tokens` for openai vs azure by @Jeadie in <https://github.com/spiceai/spiceai/pull/3869>
- Search benchmark: write results to dataset by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3871>
- Create `evalconverter` that creates spice eval components. by @Jeadie in <https://github.com/spiceai/spiceai/pull/3864>
- Update quickstart in README.md by @ewgenius in <https://github.com/spiceai/spiceai/pull/3876>
- Remove reference to spiceai-smart-demo from the repo home by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3885>
- Trace `evals` accelerated tables updates in debug mode by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3884>
- Clarify confusing log message by @Advayp in <https://github.com/spiceai/spiceai/pull/3862>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/3840>
- Azure OpenAI models: make `endpoint` parameter required by @sgrebnov in <https://github.com/spiceai/spiceai/pull/3883>
- Use spiceai delta kernel fork, actionable message for delta checkpoint errors by @Sevenannn in <https://github.com/spiceai/spiceai/pull/3856>
- Add support for GGUF files in HF by @Jeadie in <https://github.com/spiceai/spiceai/pull/3875>

**Full Changelog**: <https://github.com/spiceai/spiceai/compare/v1.0.0-rc.1...v1.0.0-rc.2>
```text

## Resources

- [Getting started with Spice.ai](https://docs.spiceai.org/getting-started/)
- [Documentation](https://docs.spiceai.org/)

## Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

- Twitter: [@spice_ai](https://twitter.com/spice_ai)
- Discord: [https://discord.gg/kZnTfneP5u](https://discord.gg/kZnTfneP5u)
- Telegram: [Spice AI Discussion](https://t.me/spiceaichat)
- Reddit: [https://www.reddit.com/r/spiceai](https://www.reddit.com/r/spiceai)
- Email: [[email protected]](mailto:[email protected])

Spice v0.15.2-alpha (July 15, 2024)

ยท 4 min read
Luke Kim
Founder and CEO of Spice AI

The v0.15.2-alpha minor release focuses on enhancing stability, performance, and introduces Catalog Providers for streamlined access to Data Catalog tables. Unity Catalog, Databricks Unity Catalog, and the Spice.ai Cloud Platform Catalog are supported in v0.15.2-alpha. The reliability of federated query push-down has also been improved for the MySQL, PostgreSQL, ODBC, S3, Databricks, and Spice.ai Cloud Platform data connectors.

Highlights in v0.15.2-alphaโ€‹

Catalog Providers: Catalog Providers streamline access to Data Catalog tables. Initial catalog providers supported are Databricks Unity Catalog, Unity Catalog and Spice.ai Cloud Platform Catalog.

For example, to configure Spice to connect to tpch tables in the Spice.ai Cloud Platform Catalog use the new catalogs: section in the spicepod.yml:

catalogs:
- name: spiceai
from: spiceai
include:
- tpch.*
sql> show tables
+---------------+--------------+---------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------+---------------+------------+
| spiceai | tpch | region | BASE TABLE |
| spiceai | tpch | part | BASE TABLE |
| spiceai | tpch | customer | BASE TABLE |
| spiceai | tpch | lineitem | BASE TABLE |
| spiceai | tpch | partsupp | BASE TABLE |
| spiceai | tpch | supplier | BASE TABLE |
| spiceai | tpch | nation | BASE TABLE |
| spiceai | tpch | orders | BASE TABLE |
| spice | runtime | query_history | BASE TABLE |
+---------------+--------------+---------------+------------+

Time: 0.001866958 seconds. 9 rows.

ODBC Data Connector Push-Down: The ODBC Data Connector now supports query push-down for joins, improving performance for joined datasets configured with the same odbc_connection_string.

Improved Spicepod Validation Improved spicepod.yml validation has been added, including warnings when loading resources with duplicate names (datasets, views, models, embeddings).

Breaking Changesโ€‹

None.

Contributorsโ€‹

  • @phillipleblanc
  • @peasee
  • @y-f-u
  • @ewgenius
  • @Sevenannn
  • @sgrebnov
  • @lukekim

What's Changedโ€‹

Dependenciesโ€‹

Commitsโ€‹

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.1-alpha...v0.15.2-alpha

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.15.1-alpha (July 8, 2024)

ยท 4 min read
Luke Kim
Founder and CEO of Spice AI

The v0.15.1-alpha minor release focuses on enhancing stability, performance, and usability. Memory usage has been significantly improved for the postgres and duckdb acceleration engines which now use stream processing. A new Delta Lake Data Connector has been added, sharing a delta-kernel-rs based implementation with the Databricks Data Connector supporting deletion vectors.

Highlightsโ€‹

Improved memory usage for PostgreSQL and DuckDB acceleration engines: Large dataset acceleration with PostgreSQL and DuckDB engines has reduced memory consumption by streaming data directly to the accelerated table as it is read from the source.

Delta Lake Data Connector: A new Delta Lake Data Connector has been added for using Delta Lake outside of Databricks.

ODBC Data Connector Streaming: The ODBC Data Connector now streams results, reducing memory usage, and improving performance.

GraphQL Object Unnesting: The GraphQL Data Connector can automatically unnest objects from GraphQL queries using the unnest_depth parameter.

Breaking Changesโ€‹

None.

Contributorsโ€‹

What's Changedโ€‹

Dependenciesโ€‹

The MySQL, PostgreSQL, SQLite and DuckDB DataFusion TableProviders developed by Spice AI have been donated to the datafusion-contrib/datafusion-table-providers community repository.

From the v0.15.1-alpha release, a new dependency is taken on datafusion-contrib/datafusion-table-providers

Commitsโ€‹

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.0-alpha...v0.15.1-alpha

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.11.1-alpha (April 22, 2024)

ยท 3 min read
Luke Kim
Founder and CEO of Spice AI

The v0.11.1-alpha release introduces retention policies for accelerated datasets, native Windows installation support, and integration of catalog and schema settings for the Databricks Spark connector. Several bugs have also been fixed for improved stability.

Highlightsโ€‹

  • Retention Policies for Accelerated Datasets: Automatic eviction of data from accelerated time-series datasets when a specified temporal column exceeds the retention period, optimizing resource utilization.

  • Windows Installation Support: Native Windows installation support, including upgrades.

  • Databricks Spark Connect Catalog and Schema Settings: Improved translation between DataFusion and Spark, providing better Spark Catalog support.

Contributorsโ€‹

  • @phillipleblanc
  • @Jeadie
  • @ewgenius
  • @sgrebnov
  • @y-f-u
  • @lukekim
  • @digadeesh
  • @Sevenannn
  • @gloomweaver

New in this releaseโ€‹

What's Changedโ€‹

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.11.0-alpha...v0.11.1-alpha

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.11-alpha (April 15, 2024)

ยท 4 min read
Luke Kim
Founder and CEO of Spice AI

The Spice v0.11-alpha release significantly improves the Databricks data connector with Databricks Connect (Spark Connect) support, adds the DuckDB data connector, and adds the AWS Secrets Manager secret store. In addition, enhanced control over accelerated dataset refreshes, improved SSL security for MySQL and PostgreSQL connections, and overall stability improvements have been added.

Highlights in v0.11-alphaโ€‹

DuckDB data connector: Use DuckDB databases or connections as a data source.

AWS Secrets Manager Secret Store: Use AWS Secrets Managers as a secret store.

Custom Refresh SQL: Specify a custom SQL query for dataset refresh using refresh_sql.

Dataset Refresh API: Trigger a dataset refresh using the new CLI command spice refresh or via API.

Expanded SSL support for Postgres: SSL mode now supports disable, require, prefer, verify-ca, verify-full options with the default mode changed to require. Added pg_sslrootcert parameter for setting a custom root certificate and the pg_insecure parameter is no longer supported.

Databricks Connect: Choose between using Spark Connect or Delta Lake when using the Databricks data connector for improved performance.

Improved SSL support for Postgres: ssl mode now supports disable, require, prefer, verify-ca, verify-full options with default mode changed to require. Added pg_sslrootcert parameter to allow setting custom root cert for postgres connector, pg_insecure parameter is no longer supported as redundant.

Internal architecture refactor: The internal architecture of spiced was refactored to simplify the creation data components and to improve alignment with DataFusion concepts.

New Contributorsโ€‹

@edmondop's first contribution github.com/spiceai/spiceai/pull/1110!

Contributorsโ€‹

  • @phillipleblanc
  • @Jeadie
  • @ewgenius
  • @sgrebnov
  • @y-f-u
  • @lukekim
  • @digadeesh
  • @Sevenannn
  • @gloomweaver
  • @ahirner

New in this releaseโ€‹

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.10.2-alpha...v0.11.0-alpha

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.