Curiosity

Graph Embeddings (PageSpace)

The PageSpaceEmbeddingsIndex produces vectors from the graph's topology, not from text. Two nodes connected by similar edge patterns — same neighbors, same paths — end up close in vector space. It's how the workspace answers questions like "which customers behave like this one?" or "which products are functionally substitutable?" without depending on field text.

The implementation lives in Mosaik.GraphDB.Indexes.PageSpaceEmbeddingsIndex, backed by the Mosaik.GraphDB.Models.PageSpace self-supervised model. Both implement the same ISimilarityIndex surface as the other embedding indexes, so consumers don't care which kind of vector they're querying.

How it works

PageSpace is a StarSpace-style embedding model trained over random walks through the graph:

flowchart LR G[(Graph)] --> Walks[Random walks<br/>EdgesToFollow] Walks --> Train[StarSpace-style training<br/>negative sampling] Train --> Vecs[Vector per node] Vecs --> HNSW[(HNSW searcher)] New[(New node committed)] --> Predict[pageSpace.TryGetVector] Predict --> HNSW

The model learns one vector per node by sampling positive examples (nodes that co-occur on a walk) and negative samples (random nodes), and adjusting both vectors of each positive pair to be more similar while pushing negatives apart. You configure which edges to follow and which node types are eligible — those choices define what "similar" means for your domain.

After training, new nodes get vectors predicted in real time as they're committed (PageSpace.TryGetVector), without retraining the whole model.

When to use it

Question Right tool
"Find products with similar names / descriptions" Sentence Embeddings.
"Find customers who behave like this one (same purchase patterns, same support cases)" Graph Embeddings.
"Find substitute products based on who buys them" Graph Embeddings.
"Find connected components / explicit graph paths" Regular IQuery traversal (StartAt/Out/In) — not embeddings.

The vectors are sensitive to graph density. Sparse subgraphs (nodes with very few edges) produce poor embeddings; you want every node to participate in at least a handful of meaningful relationships.

Registering and configuring

var settings = new SettingsHolder();
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.Dimensions),            "128");
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.Epoch),                 "50");
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.LearningRate),          "0.05");
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.NegativeSamplingCount), "10");

var index = await Graph.Indexes.AddPageSpaceEmbeddingsIndexAsync(
    nodeType: N.Customer.Type,
    setting:  settings);

// Define the walk topology — which edges to traverse and which node types to land on.
index.SetNodesAndEdges(
    edgesToFollow: new[] { E.Placed, E.Contains, E.OpenedCase, E.About },
    nodesToFollow: new[] { N.Customer.Type, N.Order.Type, N.Product.Type, N.SupportCase.Type });

await index.TrainAsync(progressLogger: msg => Logger.LogInformation(msg));

Options reference

PageSpaceEmbeddingsIndexData (the option bag) and PageSpaceData (training-time settings on the underlying model) expose:

Option Type Default Effect
Dimensions int 128 Vector dimensionality. Higher = more capacity, more memory.
Epoch int 50 Training passes over the walks.
LearningRate float 0.05 SGD step size.
NegativeSamplingCount int 10 Negative samples per positive pair.
Threads int Environment.ProcessorCount Worker threads during training.
EdgesToFollow string[] none Edge types the random walks may traverse. Required.
NodesToFollow string[] none Node types eligible for walks. Required.
UseSimilarityForTokens bool true When trained with _Token nodes, biases negative sampling toward dissimilar tokens.
Buckets (PageSpaceData.Buckets) uint 2,000,000 Subword/token bucket size.

The index is not auto-trained on commit — call TrainAsync once a meaningful amount of data is in the graph, then again periodically (e.g. as a scheduled task) as the graph shifts.

Reading the vectors

PageSpace embeddings plug into the same IQuery surface as the other indexes:

// Find 20 customers structurally similar to a seed customer.
var index = Graph.Indexes
    .OfType<PageSpaceEmbeddingsIndex>(N.Customer.Type)
    .First();

var similar = Q().StartAt(customerUID)
                 .Similar(IndexTypes.PageSpaceEmbeddingsIndex, index.UID, count: 20)
                 .EmitWithScores();

For mixed scenarios — "structurally similar and in the same region", or "similar plus a soft boost from a text signal" — use the Similarity Engine with one signal that calls Similar(IndexTypes.PageSpaceEmbeddingsIndex, …) and a second one that does whatever else you need.

See also

© 2026 Curiosity. All rights reserved.