Similarity & Vector Search

This section is the reference for everything vector in Curiosity Workspace: what each embedding index does, which built-in models are available, how to call them from IQuery, how to compose multiple signals into a similarity scenario, and how to cluster and visualize the result.

If you only need to add semantic retrieval to the search UI, start with AI Search — that page is the operator's view. The pages below are the builder's view: what you call from C# in endpoints, code indexes, and the shell.

The three embedding indexes

Curiosity ships three embedding indexes. Each produces vectors per node, stores them in an HNSW (Hierarchical Navigable Small World) approximate nearest-neighbor index, and exposes the same ISimilarityIndex / ITextSimilarityIndex surface to the rest of the system. They differ in where the vector comes from:

Index	Vector source	Use when
Sentence Embeddings	A transformer model encodes a text field on the node.	You want semantic search or recommendations over free-text fields (names, summaries, bodies).
Graph Embeddings (PageSpace)	A self-supervised model trains over the graph topology — nodes near each other in the graph end up near each other in vector space.	You want "structurally similar" results — e.g. users with similar behavior, products with similar buying patterns, regardless of text.
Raw Embeddings	You supply your own vectors (from an external provider, a domain-specific model, or any code you can run).	You already have embeddings, or you need a model the workspace doesn't ship with.

All three implement ISimilarityIndex. From the query layer, a similar-products lookup looks identical regardless of which index is configured — the difference is purely in how the vector was produced.

Where similarity surfaces in code

flowchart LR Node[(Node committed)] --> Idx[Embedding index<br/>HNSW] Idx -->|FindSimilar| Q1[IQuery.Similar] Idx -->|FindSimilarAsync| Q2[IQuery.StartAtSimilarText] Q1 --> Sim[ToSimilarity scenario] Q2 --> Sim Sim --> Result[SimilarityResult<br/>scores per UID] Result --> Cluster[WeightedGraph.Cluster] Cluster --> View[ForceGraphView]

Layer	What it does
Index (`SentenceEmbeddingsIndex`, `PageSpaceEmbeddingsIndex`, `RawEmbeddingsIndex`)	Computes vectors and serves nearest-neighbor lookups.
`IQuery.Similar(...)`	Inside a query chain, replaces the current set with each node's neighbors. Cheap, single-index. See IQuery Similarity Search.
`IQuery.StartAtSimilarTextAsync(...)`	Starts a query from text — encodes the text and pulls neighbors from a `ITextSimilarityIndex`.
`IQuery.ToSimilarity(...)`	Builds a multi-signal scenario: combine vector neighbors with graph traversals or external lookups, fuse, and apply rules. See Similarity Engine.
`WeightedGraph<T>.Cluster(...)`	Takes a list of weighted similarity edges and groups nodes into clusters. See Clustering & Visualization.

In this section

Sentence Embeddings

Embed text fields with built-in transformer models (MiniLM, ArcticXS) or an external provider. The default choice for semantic search.

Graph Embeddings (PageSpace)

Self-supervised embeddings derived from the graph's topology. Find structurally similar nodes regardless of their text content.

Raw Embeddings

Bring your own vectors. Index pre-computed embeddings from any source via Curiosity.Library.

IQuery Similarity Search

Consume vectors from inside IQuery: Similar(), StartAtSimilarTextAsync(), narrowing to a specific index, and building endpoints that return scored UIDs.

Similarity Engine

Combine multiple signals (text + graph + external) into a single ranked similarity result with IQuery.ToSimilarity(...).

Clustering & Visualization

Turn many similarity results into a WeightedGraph, extract clusters, and render them with ForceGraphView.