Skip to main content
Version: Next

Elasticsearch Data Connector

The Elasticsearch Data Connector exposes Elasticsearch indexes as SQL tables in Spice. Index mappings are translated to Arrow schemas so that documents can be queried with federated SQL alongside data from other connectors. The connector also bridges Elasticsearch's native kNN and full-text search into Spice, enabling hybrid search through the standard vector_search, text_search, and rrf UDTFs.

datasets:
- from: elasticsearch:products
name: products
params:
elasticsearch_endpoint: https://localhost:9200
elasticsearch_user: ${secrets:es_user}
elasticsearch_pass: ${secrets:es_pass}
Enterprise edition

The Elasticsearch connector is available in the Spice Enterprise edition.

Configuration

from

The from field takes the form elasticsearch:{index_name} where index_name is the Elasticsearch index to query.

datasets:
- from: elasticsearch:products
name: products

Dot-separated paths may be used to refer to nested fields in query results (e.g. address.city); the connector flattens object mappings into Arrow columns using that convention.

name

The dataset name used as the table name within Spice. The dataset name cannot be a reserved keyword.

params

The Elasticsearch connector accepts the following params. Use the secret replacement syntax to load credentials from a secret store.

Parameter NameDescriptionRequiredDefault
elasticsearch_endpointCluster URL (e.g., https://localhost:9200).Yes-
elasticsearch_userUsername for HTTP basic authentication.No-
elasticsearch_passPassword for HTTP basic authentication.No-

Types

The connector derives an Arrow schema from each index's mapping via GET /<index>/_mapping. Elasticsearch field types map to Arrow as follows:

Elasticsearch Field TypeArrow TypeNotes
text, keyword, wildcard, constant_keyword, match_only_textUtf8
longInt64
integerInt32
shortInt16
byteInt8
doubleFloat64
float, half_float, scaled_floatFloat32
booleanBoolean
date, date_nanosUtf8ES dates are flexibly formatted; preserved as strings.
binaryUtf8Base64-encoded in the JSON response.
ipUtf8
dense_vector (with dims)FixedSizeList<Float32, dims>Required dims field must fit in i32.
dense_vector (missing dims)Utf8Falls back to raw JSON when dims cannot be resolved.
object, nestedUtf8Serialized JSON.
Any other mapping typeUtf8Fallback — the raw JSON value is preserved as a string.

Nested object fields are flattened by concatenating field names with dots (e.g. address.city). nested fields are preserved as JSON strings because per-document ordering must be retained.

Querying

After registering a dataset, query it like any other Spice table:

SELECT name, price
FROM products
WHERE price > 100
ORDER BY price DESC
LIMIT 10;

When an index contains a dense_vector field, the Elasticsearch connector wires it into Spice's search pipeline. This enables:

  • Vector similarity search via vector_search — executed natively as an Elasticsearch kNN query.
  • Full-text search via text_search — executed using Elasticsearch's native BM25 ranking.
  • Hybrid search via rrf — combining both with Reciprocal Rank Fusion.

These operations run against the Elasticsearch cluster directly rather than ingesting vectors into an accelerator, keeping indexing and search colocated in Elasticsearch.

Example:

-- kNN vector search against Elasticsearch
SELECT product_id, name, score
FROM vector_search(products, 'wireless noise cancelling headphones')
ORDER BY score DESC
LIMIT 10;

-- BM25 full-text search
SELECT product_id, name, score
FROM text_search(products, 'headphones waterproof', description)
ORDER BY score DESC
LIMIT 10;

-- Hybrid search via RRF
SELECT product_id, name, fused_score
FROM rrf(
vector_search(products, 'wireless noise cancelling headphones'),
text_search(products, 'headphones waterproof', description),
join_key => 'product_id'
)
ORDER BY fused_score DESC
LIMIT 10;

See Search Functionality for the full search feature guide.

Authentication

The connector uses HTTP basic authentication when elasticsearch_user and elasticsearch_pass are provided. For production deployments, store credentials in a secret store and reference them with ${secrets:...} rather than hard-coding them in spicepod.yaml.

TLS is enabled automatically for https:// endpoints.

Limitations

  • Nested object fields are exposed as JSON strings rather than structured columns.
  • date and date_nanos fields are preserved as strings because Elasticsearch accepts heterogeneous date formats; cast to a timestamp in SQL when numeric comparison is required.
  • dense_vector fields without a declared dims value fall back to Utf8 and are not usable as a vector column.
  • Pushdown of SQL predicates to Elasticsearch query DSL is limited; complex filter expressions are evaluated locally by DataFusion after fetching results.

Elasticsearch can also be configured as a Vector Engine for datasets sourced from other connectors (storing Spice-managed embeddings in Elasticsearch rather than querying an existing index).