Skip to main content

7 posts tagged with "data-connector"

Data connector tools and integrations

View All Tags

Spice v1.5.1 (July 28, 2025)

ยท 5 min read
Jack Eadie
Token Plumber at Spice AI

Announcing the release of Spice v1.5.1! ๐Ÿ”‘

Spice v1.5.1 expands the GitHub data connector to include pull-request comments, adds a configurable rate limiting for AWS Bedrock embedding models, expands partition pruning with inequality operators, and adds client-supplied cache keys for granular caching control in the HTTP and Arrow Flight SQL APIs.

What's New in v1.5.1โ€‹

GitHub Data Connector Pull Request Comments: Configure GitHub pulls datasets to include comments.

Example Spicepod.yaml:

datasets:
- from: github:github.com/spiceai/spiceai/pulls
name: spiceai.pulls
params:
github_include_comments: all # 'review', 'discussion', or 'none'. Defaults to 'none'.
github_max_comments_fetched: '25' # Defaults to 100
# ...

For details, see the GitHub Data Connector documentation.

AWS Bedrock Embedding Models Invocation Control: Improved rate limiting control for AWS Bedrock embedding models with max_concurrent_invocations configuration.

embeddings:
- from: bedrock:cohere.embed-english-v3
name: cohere-embeddings
params:
max_concurrent_invocations: '41'
# ...

For details, see the AWS Bedrock Embeddings Model Provider documentation.

Improved Query Partitioning: Expanded partition pruning support with additional inequality operators (e.g. >, >=, <, <=).

For details, see the Query Partitioning documentation.

Client-Supplied Cache Keys: Support for a new Spice-Cache-Key header/metadata-key in the HTTP and Arrow Flight SQL query APIs to for fine-grained client-side caching control.

Example HTTP API usage:

$ curl -vvS -XPOST http://localhost:8090/v1/sql \
-H"spice-cache-key: 1851400_20170216_north_america" \
-d "select * from scihub_journals_accessed
where user_id = '1851400'
and date_trunc('DAY', timestamp) = '2017-02-16'
and city = 'New York';"

Example Response:

< HTTP/1.1 200 OK
< content-type: application/json
< x-cache: Hit from spiceai
< results-cache-status: HIT
< vary: Spice-Cache-Key
< vary: origin, access-control-request-method, access-control-request-headers
< content-length: 604
< date: Wed, 23 Jul 2025 20:26:12 GMT
<
[{
"timestamp": "2017-02-16 09:55:06",
"doi": "10.1155/2012/650929",
"ip_identifier": 1000856,
"user_id": 1851400,
"country": "United States",
"city": "New York",
"longitude": 40.7830603,
"latitude": -73.9712488
},
...
]

For details, see the Cache Control documentation.

Contributorsโ€‹

New Contributorsโ€‹

Breaking Changesโ€‹

  • N/A

Cookbook Updatesโ€‹

No new recipes added in this release.

The Spice Cookbook includes 74 recipes to help you get started with Spice quickly and easily.

Upgradingโ€‹

To upgrade to v1.5.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.5.1 image:

docker pull spiceai/spiceai:1.5.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changedโ€‹

Dependenciesโ€‹

No major dependency updates.

Changelogโ€‹

  • Fix refresh via Api when dataset is already accelerated and no refresh interval is set by @sgrebnov in #6549
  • Add support for custom GraphQL unnesting behavior by @Advayp in #6540
  • Regex Update to disallow hyphens dataset names by @varunguleriaCodes in #6383
  • Enforce max limit on comments fetched per PR by @Advayp in #6580
  • Fix accelerated refresh issue by @Advayp in #6590
  • Enable configurations of max invocations for Bedrock models by @Advayp in #6592
  • Client-supplied cache keys (Spice-Cache-Key) by @mach-kernel in #6579
  • Improved partition pruning by @kczimm in #6582
  • Fix retention filter when both retention_sql and period are set by @sgrebnov in #6595
  • Initial support for PR comments by @Advayp in #6569
  • chore: Update croner by @peasee in #6547
  • fix databricks streaming for Claude model by @peasee in #6601
  • Remove FullTextUDTFAnalyzerRule and move FTS code into search crate by @jeadie in #6596
  • Remove download of legacy sentence transformers config by @jeadie in #6605
  • re-add snapshot tests by @jeadie
  • Embedding column config to support client-specified vector sizes by @mach-kernel in #6610
  • Fix mismatch in columns for the GitHub PR table type by @Advayp in #6616
  • bump version to 1.5.1 by @phillipleblanc
  • fix issues with cherry-picking by @jeadie
  • Add integration tests for GitHub PRs with comments by @Advayp in #6581
  • Add view name to view creation errors by @lukekim in #6611
  • CDC: Compute embeddings on ingest by @mach-kernel in #6612

Spice v1.2.1 (May 6, 2025)

ยท 7 min read
Sergei Grebnov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.2.1! ๐Ÿ”ฅ

Spice v1.2.1 includes several data connector fixes and improves query performance for accelerated views. This release also introduces Databricks Service Principal (M2M OAuth) authentication and expands parameterized queries.

Highlights in v1.2.1โ€‹

  • Databricks Service Principal Support: Databricks datasets and catalogs now support Machine-to-Machine (M2M) OAuth authentication via Service Principals, enabling secure machine connections to Databricks.

    Example spicepod.yaml:

    datasets:
    - from: databricks:spiceai.datasets.my_awesome_table # A reference to a table in the Databricks unity catalog
    name: my_delta_lake_table
    params:
    mode: delta_lake
    databricks_endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com
    databricks_client_id: ${secrets:DATABRICKS_CLIENT_ID}
    databricks_client_secret: ${secrets:DATABRICKS_CLIENT_SECRET}

    For details, see documentation for:

  • Iceberg Data Connector: Now supports cross-account table access via the AWS Glue Catalog Connector and fixes an issue when querying data from append mode datasets.

  • Iceberg Catalog API: Full compatibility with the Iceberg HTTP REST Catalog API to consume Spice datasets from Iceberg Catalog clients.

    For details, see documentation for:

  • Improved Parameterized Query Support: Expanded type inference for placeholders in:

    • IN list expressions
    • LIKE patterns
    • SIMILAR TO patterns
    • LIMIT clauses
    • Subqueries

New Contributors ๐ŸŽ‰โ€‹

Contributorsโ€‹

Breaking Changesโ€‹

No breaking changes.

Cookbook Updatesโ€‹

New recipes for:

The Spice Cookbook now includes 68 recipes to help you get started with Spice quickly and easily.

Upgradingโ€‹

To upgrade to v1.2.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.2.1 image:

docker pull spiceai/spiceai:1.2.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changedโ€‹

Dependenciesโ€‹

  • No major dependency changes.

Changelogโ€‹

  • Fix: Specify metric type as a dimension for testoperator by @peasee in #5630
  • Fix: Add option to run dispatch schedule by @peasee in #5631
  • Infer placeholder datatype for InList, Like, and SimilarTo by @kczimm in #5626
  • Add QA analytics for 1.2.0 by @phillipleblanc in #5640
  • Fix: Use SPICED_COMMIT for spiced_commit_sha by @peasee in #5632
  • New crates/tools by @Jeadie in #5121
  • Update openapi.json by @github-actions in #5643
  • Enable metrics reporting for models benchmarks (evals) by @sgrebnov in #5639
  • Implement CatalogBuilder, add app and runtime references to catalog component, add runtime reference to connector params by @ewgenius in #5641
  • Fix eventing bug in LLM progress; Add tool and worker progress by @Jeadie in #5619
  • Handle small precision differences in TPCH answer validation by @phillipleblanc in #5642
  • Add TokenProviderRegistry to the runtime by @ewgenius in #5651
  • Provide ModelContextLayer for evals by @Jeadie in #5648
  • Databricks data_components refactor. Databricks Spark connect - add set_token method and writable spark session by @ewgenius in #5654
  • Extract AWS Glue warehouse for cross-account Iceberg tables by @phillipleblanc in #5656
  • Refactor Dataset component by @phillipleblanc in #5660
  • Fix Iceberg API returning 404 when schema contains a Dictionary by @phillipleblanc in #5665
  • Fix dependencies: downgrade swagger-ui to v8; force zip to 2.3.0 by @kczimm in #5664
  • Add DuckDB indexes spicepod, additional dispatches by @peasee in #5633
  • Update readme: update data federation link by @nuvic in #5673
  • Support metadata columns for object-store based data connectors by @phillipleblanc in #5661
  • Add model name to LLM judges, and add model_graded_scoring task by @Jeadie in #5655
  • Add SF1000 TPCH test spicepods for delta lake by @Sevenannn in #5606
  • Validate Github Connector resource existence before building the github connector graphql table by @Sevenannn in #5674
  • Remove hard-coded embedding performance tests in CI by @Sevenannn in #5675
  • Databricks M2M auth for spark connect data connector by @ewgenius in #5659
  • Enable federated data refresh support for accelerated views by @sgrebnov in #5677
  • Add pods watcher integration test by @Sevenannn in #5681
  • Add m2m support for databricks delta connector by @ewgenius in #5680
  • Update end_game.md by @sgrebnov in #5684
  • Update StaticTokenProvider to use SecretString instead of raw str value by @ewgenius in #5686
  • Add M2M Auth support for Databricks catalog connector by @ewgenius in #5687
  • Update UX to disable acceleration federation by @sgrebnov in #5682
  • Improve placeholder inference (LIMIT & Expr::InSubquery) by @phillipleblanc in #5692
  • Tweak default log to ignore aws_config::imds::region by @phillipleblanc in #5693
  • Make Spice properly Iceberg Catalog API compatible for load table API by @phillipleblanc in #5695
  • Use deterministic queries for Databricks m2m catalog tests by @ewgenius in #5696
  • Support retrieving the latest Iceberg table on table scan by @phillipleblanc in #5704

Full Changelog: v1.2.0...v1.2.1

Spice v0.17.1-beta (August 5, 2024)

ยท 5 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

The v0.17.1-beta minor release focuses on enhancing stability, performance, and usability. The Flight interface now supports the GetSchema API and s3, ftp, sftp, http, https, and databricks data connectors have added support for a client_timeout parameter.

Highlights in v0.17.1-betaโ€‹

Flight API GetSchema: The GetSchema API is now supported by the Flight interface. The schema of a dataset can be retrieved using GetSchema with the PATH or CMD FlightDescriptor types. The CMD FlightDescriptor type is used to get the schema of an arbitrary SQL query as the CMD bytes. The PATH FlightDescriptor type is used to retrieve the schema of a dataset.

Client Timeout: A client_timeout parameter has been added for Data Connectors: ftp, sftp, http, https, and databricks. When defined, the client timeout configures Spice to stop waiting for a response from the data source after the specified duration. The default timeout is 30 seconds.

datasets:
- from: ftp://remote-ftp-server.com/path/to/folder/
name: my_dataset
params:
file_format: csv
# Example client timeout
client_timeout: 30s
ftp_user: my-ftp-user
ftp_pass: ${secrets:my_ftp_password}

Breaking Changesโ€‹

TLS is now required to be explicitly enabled. Enable TLS on the command line using --tls-enabled true:

spice run -- --tls-enabled true --tls-certificate-file /path/to/cert.pem --tls-key-file /path/to/key.pem

Or in the spicepod.yml with enabled: true:

runtime:
tls:
# TLS explicitly enabled
enabled: true
certificate_file: /path/to/cert.pem
key_file: /path/to/key.pem

Contributorsโ€‹

  • @Jeadie
  • @y-f-u
  • @phillipleblanc
  • @sgrebnov
  • @peasee
  • @Sevenannn

What's Changedโ€‹

Dependenciesโ€‹

  • Rust: Upgraded from v1.79.0 to v1.80.0

Commitsโ€‹

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.0-beta...v0.17.1-beta

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.17-beta (July 29, 2024)

ยท 8 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the first beta release of Spice.ai OSS! ๐ŸŽ‰

The core Spice runtime has graduated from alpha to beta! Components, such as Data Connectors and Models, follow independent release milestones. Data Connectors graduating from alpha to beta include databricks, spiceai, postgres, s3, odbc, and mysql. From beta to 1.0, project will be to on improving performance and scaling to larger datasets.

This release also includes enhanced security with Transport Layer Security (TLS) secured APIs, a new spice install CLI command, and several performance and stability improvements.

Highlights in v0.17-betaโ€‹

  • Encryption in transit with TLS: The HTTP, gRPC, Metrics, and OpenTelemetry (OTEL) API endpoints can be secured with TLS by specifying a certificate and private key in PEM format.

Enable TLS using the --tls-certificate-file and --tls-key-file command-line flags:

spice run -- --tls-certificate-file /path/to/cert.pem --tls-key-file /path/to/key.pem

Or configure in the spicepod.yml:

runtime:
tls:
certificate_file: /path/to/cert.pem
key_file: /path/to/key.pem

Get started with TLS by following the TLS Sample. For more details see the TLS Documentation.

  • spice install: Running the spice install CLI command will download and install the latest version of the runtime.
spice install
  • Improved SQLite and DuckDB compatibility: The SQLite and DuckDB accelerators support more complex queries and additional data types.

  • Pass through arguments from spice run to runtime: Arguments passed to spice run are now passed through to the runtime.

  • Secrets replacement within connection strings: Secrets are now replaced within connection strings:

datasets:
- from: mysql:my_table
name: my_table
params:
mysql_connection_string: mysql://user:${secrets:mysql_pw}@localhost:3306/db

Breaking Changesโ€‹

The odbc data connector is now optional and has been removed from the released binaries. To use the odbc data connector, use the official Spice Docker image or build the Spice runtime from source.

To build Spice from source with the odbc feature:

cargo build --release --features odbc

To use the official Spice Docker image from DockerHub:

# Pull the latest official Spice image
docker pull spiceai/spiceai:latest

# Pull the official v0.17-beta Spice image
docker pull spiceai/spiceai:0.17.0-beta

Contributorsโ€‹

  • @y-f-u
  • @peasee
  • @digadeesh
  • @phillipleblanc
  • @ewgenius
  • @sgrebnov
  • @Sevenannn
  • @lukekim

What's Changedโ€‹

Dependenciesโ€‹

Commitsโ€‹

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.16.0-alpha...v0.17-beta

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Spice v0.15-alpha (July 1, 2024)

ยท 5 min read
Luke Kim
Founder and CEO of Spice AI

The v0.15-alpha release introduces support for streaming databases changes with Change Data Capture (CDC) into accelerated tables via a new Debezium connector, configurable retry logic for data refresh, and the release of a new C# SDK to build with Spice in Dotnet.

Highlights in v0.15-alphaโ€‹

  • Debezium data connector with Change Data Capture (CDC): Sync accelerated datasets with Debezium data sources over Kafka in real-time.

  • Data Refresh Retries: By default, accelerated datasets attempt to retry data refreshes on transient errors. This behavior can be configured using refresh_retry_enabled and refresh_retry_max_attempts.

  • C# Client SDK: A new C# Client SDK has been released for developing applications in Dotnet.

Debezium data connector with Change Data Capture (CDC)โ€‹

Integrating Debezium CDC is straightforward. Get started with the Debezium CDC Sample, read more about CDC in Spice, and read the Debezium data connector documentation.

Example Spicepod using Debezium CDC:

datasets:
- from: debezium:cdc.public.customer_addresses
name: customer_addresses_cdc
params:
debezium_transport: kafka
debezium_message_format: json
kafka_bootstrap_servers: localhost:19092
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_mode: changes

Data Refresh Retriesโ€‹

Example Spicepod configuration limiting refresh retries to a maximum of 10 attempts:

datasets:
- from: eth.blocks
name: blocks
acceleration:
refresh_retry_enabled: true
refresh_retry_max_attempts: 10
refresh_check_interval: 30s

Breaking Changesโ€‹

None.

New Contributorsโ€‹

Contributorsโ€‹

What's Changedโ€‹

Dependenciesโ€‹

No major dependency updates.

Commitsโ€‹

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.14.1-alpha...v0.15.0-alpha

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.