Skip to main content
Version: Next

Functions (User-Defined Functions)

Functions extend Spice's SQL engine with custom scalar and table logic. Each entry in the top-level functions: block is registered as a callable SQL function and (for scalar functions, by default) as an LLM tool.

For an overview, examples, and execution-tier details, see Functions.

Registration is off by default

The functions: section is only honored when runtime.functions.enabled is set to true.

functions​

The functions: section in your configuration declares one or more scalar or table functions.

Example:

runtime:
functions:
enabled: true

functions:
- name: double_it
from: sql
description: Double a 64-bit integer.
volatility: immutable
signature:
args:
- { name: x, type: int64 }
returns: int64
body: 'x * 2'

name​

A unique identifier for the function. The name is used to register the UDF in the SQL session and as the tool name when surfaced to LLMs.

from​

Source URI selecting the execution tier:

  • sql — Inline SQL body executed in-process.
  • http://... / https://... — Remote endpoint invoked over HTTP + JSON.

Other schemes are rejected at startup. See Execution Tiers.

enabled​

Optional. Defaults to true. Set to false to keep the declaration in the spicepod without registering the function — useful for staged rollouts. Disabled functions are also hidden from list_udfs() and GET /v1/functions.

description​

Optional. Free-form description surfaced in list_udfs(), GET /v1/functions, and the LLM tool registry.

kind​

Optional. Defaults to scalar.

ValueDescription
scalar(default) Returns a single value per row. Called as SELECT my_fn(x) FROM ....
tableReturns multiple rows and columns. Called as SELECT ... FROM my_fn(x). Always SQL-only (as_tool is ignored).

volatility​

Optional. Defaults to volatile. Controls how the optimizer treats the function across calls.

ValueDescription
immutableSame inputs always yield the same output. May be constant-folded and cached.
stableStable within a single query but may change across queries. Cached per query.
volatile(default) Unpredictable on every call. Never cached.

Choose the strongest level that's actually true. See Volatility.

signature​

Required. Typed argument list and return type.

signature:
args:
- { name: lat1, type: float64 }
- { name: lon1, type: float64 }
returns: float64

signature.args​

Optional. Positional argument list. Empty for niladic functions.

Each entry has:

  • name — Argument name. Used inside SQL body expressions and as the JSON key for remote tier requests.
  • type — Arrow type. Accepts Spicepod aliases (int64, utf8, list<int64>, decimal(38,10), timestamp(us, utc)) and Arrow display forms (Int64, List(Int64), etc.). See Types.

signature.returns​

Required. For scalar functions, a single Arrow type string (e.g., int64, utf8). For table functions, a list of output column definitions:

# Scalar function
returns: int64

# Table function
returns:
- { name: value, type: int64 }
- { name: label, type: utf8 }

body​

Inline SQL body. Required when from: sql (unless body_ref is set instead). For scalar functions, must be a single SQL expression referencing the function's arguments by name. For table functions, must be a single SELECT query; scalar arguments are available via a virtual args table.

body: |
6371 * acos(
cos(radians(lat1)) * cos(radians(lat2)) *
cos(radians(lon2) - radians(lon1)) +
sin(radians(lat1)) * sin(radians(lat2))
)

The body can call any DataFusion built-in scalar function (math, string, datetime, JSON, regex, etc.).

Mutually exclusive with body_ref. Must not be set for non-SQL from: schemes.

body_ref​

Path to a local filesystem file whose contents are the SQL body. Resolved relative to the runtime's current working directory at registration time.

body_ref: ./functions/shipping_class.sql

body_ref is not read from object stores when a spicepod is loaded remotely — use inline body: for portable remote spicepods.

Mutually exclusive with body.

params​

Optional. Tier-specific parameters. Supports ${secrets:KEY} and ${env:KEY} interpolation at registration time.

For the Remote tier (from: http://... / https://...):

ParameterDefaultDescription
timeout30sPer-call timeout. Plain integer seconds or Ns / Nms strings.
batch_size1024Maximum rows per HTTP request. Capped at 100 000.
batch_concurrency4Maximum in-flight HTTP batches per invocation. Capped at 64.
auth_bearerunsetWhen set, adds Authorization: Bearer <value> to each request. Use ${secrets:...}.

The SQL tier does not currently use params.

metadata​

Optional. Free-form key/value pairs surfaced alongside the function definition. Useful for tagging, ownership, or custom metadata consumers.

metadata:
owner: data-platform
jira: DATA-1234

as_tool​

Optional. Defaults to true for scalar functions. When true, the function is registered as an LLM tool with the same name and description and becomes callable from chat completions, POST /v1/tools/<name>, and the /v1/tools listing. Table functions (kind: table) are always SQL-only regardless of this setting.

Set to false to keep the function SQL-only:

functions:
- name: internal_hash
from: sql
as_tool: false
signature:
args: [{ name: x, type: int64 }]
returns: int64
body: 'x * 2654435761'

dependsOn​

Optional. Names of other Spicepod components that must be loaded before this function (e.g. a dataset the function queries internally).

metrics​

Optional. Per-function metrics configuration. See Component Metrics.

Discovering registered functions​

After startup, registered functions can be inspected:

  • From SQL — SELECT * FROM list_udfs() WHERE source = 'user';
  • From the HTTP API — GET /v1/functions returns a JSON array of user functions.

See Discovering registered functions.