Skip to main content
Version: Next (v1.11)

Data Accelerators

Data sourced by Data Connectors can be locally materialized and accelerated using a Data Accelerator.

A Data Accelerator queries/fetches data from a connected data source and stores/updates it locally in an embedded acceleration engine, such as Spice Cayenne, DuckDB, or SQLite. To set data refresh behavior, such as refreshing data on an interval, see Data Refresh.

Dataset acceleration is enabled by setting the acceleration configuration:

datasets:
- name: accelerated_dataset
acceleration:
enabled: true

For the complete reference specification, see datasets.

By default, datasets are locally materialized using in-memory Arrow records.

Supported Data Accelerators

NameDescriptionStatusEngine Modes
arrowIn-Memory Arrow RecordsStablememory
cayenneSpice CayenneAlpha (v1.9.0-rc.1+)file
duckdbEmbedded DuckDBStablememory, file
postgresAttached PostgreSQLRelease CandidateN/A
sqliteEmbedded SQLiteRelease Candidatememory, file
tursoEmbedded TursoBetamemory, file

Choosing an Accelerator

Select the appropriate accelerator based on dataset size, query patterns, and resource constraints:

Use CaseRecommended AcceleratorRationale
Small datasets (under 1 GB), maximum speedarrowIn-memory storage provides lowest latency
Medium datasets (1-100 GB), complex SQLduckdbMature SQL support with memory management
Large datasets (100 GB - 1+ TB), scalable analyticscayenneVortex columnar format scales beyond single-file limits
Point lookups on large datasetscayenneVortex provides 100x faster random access vs Parquet
Simple queries, low resource usagesqliteLightweight, minimal overhead
Async operations, concurrent workloadstursoNative async support, modern connection pooling
External database integrationpostgresLeverage existing PostgreSQL infrastructure

Spice Cayenne vs DuckDB

Both Spice Cayenne and DuckDB support file-based acceleration, but differ in architecture and performance characteristics:

Choose Spice Cayenne when:

  • Datasets exceed ~1 TB
  • Multi-file data ingestion is required (e.g., partitioned S3 data)
  • Lower memory overhead is preferred
  • Workloads benefit from Vortex's 10-20x faster scans
  • Point lookups and random access patterns are common (100x faster than Parquet)

Choose DuckDB when:

  • Datasets are under ~1 TB
  • Complex SQL features are required (window functions, CTEs)
  • Existing DuckDB tooling integration is beneficial
  • Explicit index control is required

Data Types

Data Accelerators may not support all possible Apache Arrow data types. For complete compatibility, see specifications.

Memory Considerations

When accelerating a dataset using mode: memory (the default), some or all of the dataset is loaded into memory. Ensure sufficient memory is available, including overhead for queries and the runtime, especially with concurrent queries.

In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by duckdb, sqlite, and turso accelerators by specifying mode: file.

Data Accelerator Docs