An abstract diagram showing text cards and embedding space with highlighted points and cosine annotation.

Semantic retrieval using embeddings. Finds conceptually similar content even when no keywords match.


Core concepts:

Term What it means
Embedding Dense numeric vector representing the meaning of a text
Cosine similarity Score ∈ [0, 1] — how close two vectors are in meaning
Chunk A segment of a long field, embedded separately
Threshold Minimum similarity to include a result (start at 0.55–0.65)
Top-k Number of nearest neighbours to return (8 for RAG, 5–10 for "similar items")

Enable in Settings → Search → Indexes: mark a field as vector-indexed, set chunk size and overlap. A background job backfills all existing nodes.

Picking an embedding model:

Goal Model
Best quality text-embedding-3-large (OpenAI)
Predictable cost text-embedding-3-small
No data egress Built-in MiniLM or ArcticXS
Multilingual Any modern hosted model

If you switch embedding models, you must re-embed the entire corpus. Vectors from different models are not comparable.

Vector search