Skip to main content

9 posts tagged with "duckdb"

DuckDB database topics and usage

View All Tags

Spice v0.17.2-beta (August 26, 2024)

ยท 6 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v0.17.2-beta ๐Ÿ„

The v0.17.2-beta release focuses on improving data accelerator compatibility, stability, and performance. Expanded data type support for DuckDB, SQLite, and PostgreSQL data accelerators (and data connectors) enables significantly more data types to be accelerated. Error handling and logging has also been improved along with several bugs.

Highlights in v0.17.2-betaโ€‹

Expanded Data Type Support for Data Accelerators: DuckDB, SQLite, and PostgreSQL Data Accelerators now support a wider range of data types, enabling acceleration of more diverse datasets.

Enhanced Error Handling and Logging: Improvements have been made to aid in troubleshooting and debugging.

Anonymous Usage Telemetry: Optional, anonymous, aggregated telemetry has been added to help improve Spice. This feature can be disabled. For details about collected data, see the telemetry documentation.

To opt out of telemetry:

  1. Using the CLI flag:

    spice run -- --telemetry-enabled false
  2. Add configuration to spicepod.yaml:

    runtime:
    telemetry:
    enabled: false

Improved Benchmarking: A suite of performance benchmarking tests have been added to the project, helping to maintain and improve runtime performance; a top priority for the project.

Breaking Changesโ€‹

None.

Contributorsโ€‹

  • @Jeadie
  • @y-f-u
  • @phillipleblanc
  • @sgrebnov
  • @Sevenannn
  • @peasee
  • @ewgenius

What's Changedโ€‹

Dependenciesโ€‹

Commitsโ€‹

- Pin actions/upload-artifact to v4.3.4 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2200>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/2202>
- Update to next release version, `v0.17.2-beta` by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2203>
- add accelerator beta criteria by @y-f-u in <https://github.com/spiceai/spiceai/pull/2201>
- update helm chart to 0.17.1-beta by @Sevenannn in <https://github.com/spiceai/spiceai/pull/2205>
- add dockerignore to avoid copy target and test folder by @y-f-u in <https://github.com/spiceai/spiceai/pull/2206>
- add client timeout for deltalake connector by @y-f-u in <https://github.com/spiceai/spiceai/pull/2208>
- Upgrade tonic and opentelemetry-proto by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2223>
- Add index and resource tuning for postgres ghcr image to support postgres benchmark in sf1 by @Sevenannn in <https://github.com/spiceai/spiceai/pull/2196>
- Remove embedding columns from `retrieved_primary_keys` in v1/search by @Jeadie in <https://github.com/spiceai/spiceai/pull/2176>
- use file as db_path_param as the param prefix is trimmed by @y-f-u in <https://github.com/spiceai/spiceai/pull/2230>
- use file for sqlite db path param by @y-f-u in <https://github.com/spiceai/spiceai/pull/2231>
- docs: Clarify the global requirement for local_infile when loading TPCH by @peasee in <https://github.com/spiceai/spiceai/pull/2228>
- Revert pinning actions/upload-artifact@v4 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2232>
- Runtime tools to chat models by @Jeadie in <https://github.com/spiceai/spiceai/pull/2207>
- Create `runtime.task_history` table for queries, and embeddings by @Jeadie in <https://github.com/spiceai/spiceai/pull/2191>
- chore: Update Databricks ODBC Bench to use TPCH SF1 by @peasee in <https://github.com/spiceai/spiceai/pull/2238>
- Replace `metrics-rs` with OpenTelemetry Metrics by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2240>
- fix: Remove dead code by @peasee in <https://github.com/spiceai/spiceai/pull/2249>
- Improve tool quality and add vector search tool by @Jeadie in <https://github.com/spiceai/spiceai/pull/2250>
- fix missing partition cols in delta lake by @y-f-u in <https://github.com/spiceai/spiceai/pull/2253>
- download file from remote for delta testing by @y-f-u in <https://github.com/spiceai/spiceai/pull/2254>
- feat: Set SQLite DB path to .spice/data by @peasee in <https://github.com/spiceai/spiceai/pull/2242>
- Support tools for chat completions in streaming mode by @ewgenius in <https://github.com/spiceai/spiceai/pull/2255>
- Load component `description` field from spicepod.yaml and include in LLM context by @ewgenius in <https://github.com/spiceai/spiceai/pull/2261>
- Add parameter for `connection_pool_size` in the Postgres Data Connector by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2251>
- Add primary keys to response of `DocumentSimilarityTool` by @Jeadie in <https://github.com/spiceai/spiceai/pull/2263>
- run queries bash script by @y-f-u in <https://github.com/spiceai/spiceai/pull/2262>
- Run benchmark test on schedule by @Sevenannn in <https://github.com/spiceai/spiceai/pull/2277>
- feat: Add a reference to originating App for a Dataset by @peasee in <https://github.com/spiceai/spiceai/pull/2283>
- Tool use & telemetry productionisation. by @Jeadie in <https://github.com/spiceai/spiceai/pull/2286>
- Fix cron in benchmarks.yml by @Sevenannn in <https://github.com/spiceai/spiceai/pull/2288>
- Upgrade to DataFusion v41 by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2290>
- Chat completions adjustments and fixes by @ewgenius in <https://github.com/spiceai/spiceai/pull/2292>
- Define the new metrics Arrow schema based on Open Telemetry by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2295>
- OpenTelemetry Metrics Arrow exporter to `runtime.metrics` table by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2296>
- Calculate summary metrics from histograms for Prometheus endpoint by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2302>
- Add back Spice DF runtime_env during SessionContext construction by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2304>
- Add integration test for S3 data connector by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2305>
- Fix `secrets.inject_secrets` when secret not found. by @Jeadie in <https://github.com/spiceai/spiceai/pull/2306>
- Intra-table federation query on duckdb accelerated table by @y-f-u in <https://github.com/spiceai/spiceai/pull/2299>
- Postgres federation on acceleration by @y-f-u in <https://github.com/spiceai/spiceai/pull/2309>
- sqlite intra table federation on acceleration by @y-f-u in <https://github.com/spiceai/spiceai/pull/2308>
- feat: Add `DataAccelerator::init()` for SQLite acceleration federation by @peasee in <https://github.com/spiceai/spiceai/pull/2293>
- Initial framework for collecting anonymous usage telemetry by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2310>
- Add gRPC action to trigger accelerated dataset refresh by @sgrebnov in <https://github.com/spiceai/spiceai/pull/2316>
- add `disable_query_push_down` option to acceleration settings by @y-f-u in <https://github.com/spiceai/spiceai/pull/2327>
- Remove `v1/assist` by @Jeadie in <https://github.com/spiceai/spiceai/pull/2312>
- bump table provider version to set the correct dialect for postgres writer by @y-f-u in <https://github.com/spiceai/spiceai/pull/2329>
- Send telemetry on startup by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2331>
- Calculate resource IDs for telemetry by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2332>
- Refactor `v1/search`: include WHERE condition, allow extra columns in projection. by @Jeadie in <https://github.com/spiceai/spiceai/pull/2328>
- Add integration test for gRPC dataset refresh action by @sgrebnov in <https://github.com/spiceai/spiceai/pull/2330>
- Propagate errors through all `task_history` nested spans by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2337>
- Improve tools by @Jeadie in <https://github.com/spiceai/spiceai/pull/2338>
- update duckdb rs version to support more types: interval/duration/etc by @y-f-u in <https://github.com/spiceai/spiceai/pull/2336>
- feat: Add DuckDB accelerator init, attach databases for federation by @peasee in <https://github.com/spiceai/spiceai/pull/2335>
- Add query telemetry metrics by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2333>
- Add system prompts for LLMs; system prompts for tool using models. by @Jeadie in <https://github.com/spiceai/spiceai/pull/2342>
- Fix benchmark test to keep running when there's failed queries by @Sevenannn in <https://github.com/spiceai/spiceai/pull/2347>
- Tools as a spicepod first class citizen. by @Jeadie in <https://github.com/spiceai/spiceai/pull/2344>
- Add `bytes_processed` telemetry metric by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2343>
- fix misaligned columns from delta lake by @y-f-u in <https://github.com/spiceai/spiceai/pull/2356>
- Emit telemetry metrics to `runtime.metrics`/Prometheus as well by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2352>
- Use UTC timezone for telemetry timestamps by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2354>
- Fix MetricType deserialization by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2358>
- Add dataset details to tool using LLMs; early check tables in vector search by @Jeadie in <https://github.com/spiceai/spiceai/pull/2353>
- Bump datafusion-federation/datafusion-table-providers dependencies by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/2360>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/2362>
- fix: Disable DuckDB and SQLite federation by @peasee in <https://github.com/spiceai/spiceai/pull/2371>
- Fix system prompt in ToolUsingChat, fix builtin registration by @Jeadie in <https://github.com/spiceai/spiceai/pull/2367>
- fix: Use --profile release for benchmarks by @peasee in <https://github.com/spiceai/spiceai/pull/2372>
- nql parameter 'use' -> 'model' by @Jeadie in <https://github.com/spiceai/spiceai/pull/2366>

**Full Changelog**: <https://github.com/spiceai/spiceai/compare/v0.17.1-beta...v0.17.2-beta>

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.15.1-alpha (July 8, 2024)

ยท 4 min read
Luke Kim
Founder and CEO of Spice AI

The v0.15.1-alpha minor release focuses on enhancing stability, performance, and usability. Memory usage has been significantly improved for the postgres and duckdb acceleration engines which now use stream processing. A new Delta Lake Data Connector has been added, sharing a delta-kernel-rs based implementation with the Databricks Data Connector supporting deletion vectors.

Highlightsโ€‹

Improved memory usage for PostgreSQL and DuckDB acceleration engines: Large dataset acceleration with PostgreSQL and DuckDB engines has reduced memory consumption by streaming data directly to the accelerated table as it is read from the source.

Delta Lake Data Connector: A new Delta Lake Data Connector has been added for using Delta Lake outside of Databricks.

ODBC Data Connector Streaming: The ODBC Data Connector now streams results, reducing memory usage, and improving performance.

GraphQL Object Unnesting: The GraphQL Data Connector can automatically unnest objects from GraphQL queries using the unnest_depth parameter.

Breaking Changesโ€‹

None.

Contributorsโ€‹

What's Changedโ€‹

Dependenciesโ€‹

The MySQL, PostgreSQL, SQLite and DuckDB DataFusion TableProviders developed by Spice AI have been donated to the datafusion-contrib/datafusion-table-providers community repository.

From the v0.15.1-alpha release, a new dependency is taken on datafusion-contrib/datafusion-table-providers

Commitsโ€‹

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.0-alpha...v0.15.1-alpha

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.14.1-alpha (June 24, 2024)

ยท 4 min read
Luke Kim
Founder and CEO of Spice AI

The v0.14.1-alpha release is focused on quality, stability, and type support with improvements in PostgreSQL, DuckDB, and GraphQL data connectors.

Highlightsโ€‹

  • PostgreSQL acceleration and data connector: Support for Composite Types and UUID data types.
  • DuckDB acceleration and data connector: Support for LargeUTF8 and DuckDB functions.
  • GraphQL data connector: Improved error handling on invalid query syntax.
  • Refresh SQL: Improved stability when overwriting STRUCT data types.

Breaking Changesโ€‹

None.

New Contributorsโ€‹

Contributorsโ€‹

  • @lukekim
  • @y-f-u
  • @ewgenius
  • @phillipleblanc
  • @Jeadie
  • @sgrebnov
  • @gloomweaver
  • @phungleson
  • @peasee
  • @digadeesh

What's Changedโ€‹

Dependenciesโ€‹

No major dependency updates.

Commitsโ€‹

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.14.0-alpha...v0.14.1-alpha

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice.ai v0.7-alpha

ยท 2 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v0.7-alpha! ๐Ÿน

Spice v0.7-alpha is an all new implementation of Spice written in Rust. The Spice v0.7 runtime provides developers with a unified SQL query interface to locally accelerate and query data tables sourced from any database, data warehouse, or data lake.

Learn more and get started in minutes with the updated Quickstart in the repository README!

Highlights in v0.7-alphaโ€‹

DataFusion SQL Query Engine: Spice v0.7 leverages the Apache DataFusion query engine to provide very fast, high quality SQL query across one or more local or remote data sources.

Data tables can be locally accelerated using Apache Arrow in-memory or by DuckDB.

New in this releaseโ€‹

  • Adds runtime rewritten in Rust for high-performance.
  • Adds Apache DataFusion SQL query engine.
  • Adds The Spice.ai platform as a data source.
  • Adds Dremio as a data source.
  • Adds OpenTelemetry (OTEL) collector.
  • Adds local data table acceleration.
  • Adds DuckDB file or in-memory as a data table acceleration engine.
  • Adds In-memory Apache Arrow as a data table acceleration engine.
  • Removes the built-in AI training engine; now cloud-based and provided by the Spice.ai platform.
  • Removes the built-in dashboard and web-interface; now cloud-based and provided by the Spice.ai platform.

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.