In-Memory Arrow Data Accelerator
The In-Memory Arrow Data Accelerator is the default data accelerator in Spice. It uses Apache Arrow to store data in-memory for fast access and query performance.
Configuration​
To use the In-Memory Arrow Data Accelerator, no additional configuration is required beyond enabling acceleration.
Example:
datasets:
- from: spice.ai:path.to.my_dataset
name: my_dataset
acceleration:
enabled: true
However Arrow can be specified explicitly using arrow
as the engine
for acceleration.
datasets:
- from: spice.ai:path.to.my_dataset
name: my_dataset
acceleration:
enabled: true
engine: arrow
Limitations​
- The In-Memory Arrow Data Accelerator does not support persistent storage. Data is stored in-memory and will be lost when the Spice runtime is stopped.
- The In-Memory Arrow Data Accelerator does not support
Decimal256
(76 digits), as it exceeds Arrow's maximum Decimal width of 38 digits. - The In-Memory Arrow Data Accelerator does not support indexes.
- The In-Memory Arrow Data Accelerator only supports primary-key constraints, not
unique
constraints. - With Arrow acceleration, mathematical operations like
value1 / value2
are treated as integer division if the values are integers. For example,1 / 2
will result in 0 instead of the expected 0.5. Use casting to FLOAT to ensure conversion to a floating-point value:CAST(1 AS FLOAT) / CAST(2 AS FLOAT)
(orCAST(1 AS FLOAT) / 2
).
When accelerating a dataset using the In-Memory Arrow Data Accelerator, some or all of the dataset is loaded into memory. Ensure sufficient memory is available, including overhead for queries and the runtime, especially with concurrent queries.
In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by duckdb
and sqlite
accelerators by specifying mode: file
.
Cookbook​
- A cookbook recipe to configure In-Memory Arrow as data accelerator in Spice. In-Memory Arrow Data Accelerator