Skip to main content

Embeddings

Embeddings convert text or other data into vector representations for machine learning and natural language processing tasks.

embeddings

The embeddings section in your configuration specifies one or more embedding models for your datasets.

Example:

embeddings:
- from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2:latest
name: text_embedder
params:
max_length: '128'
datasets:
- my_text_dataset

from

The from field specifies the source of the embedding model. It supports the following prefixes:

  • huggingface:huggingface.co - Models from Hugging Face
  • file: - Local file paths
  • openai - OpenAI models

Follows the same convention as models.from.

name

A unique identifier for this embedding component.

files

Optional. A list of files associated with this model. Each file has:

  • path: The path to the file
  • name: Optional. A name for the file
  • type: Optional. The type of the file (automatically determined if not specified)

Follows the same convention as models.files.

params

Optional. A map of key-value pairs for additional parameters specific to the embedding model.

dependsOn

Optional. A list of dependencies that must be loaded and available before this embedding model.