Graph Embeddings (PageSpace)
The PageSpaceEmbeddingsIndex produces vectors from the graph's topology, not from text. Two nodes connected by similar edge patterns — same neighbors, same paths — end up close in vector space. It's how the workspace answers questions like "which customers behave like this one?" or "which products are functionally substitutable?" without depending on field text.
The implementation lives in Mosaik.GraphDB.Indexes.PageSpaceEmbeddingsIndex, backed by the Mosaik.GraphDB.Models.PageSpace self-supervised model. Both implement the same ISimilarityIndex surface as the other embedding indexes, so consumers don't care which kind of vector they're querying.
How it works
PageSpace is a StarSpace-style embedding model trained over random walks through the graph:
The model learns one vector per node by sampling positive examples (nodes that co-occur on a walk) and negative samples (random nodes), and adjusting both vectors of each positive pair to be more similar while pushing negatives apart. You configure which edges to follow and which node types are eligible — those choices define what "similar" means for your domain.
After training, new nodes get vectors predicted in real time as they're committed (PageSpace.TryGetVector), without retraining the whole model.
When to use it
| Question | Right tool |
|---|---|
| "Find products with similar names / descriptions" | Sentence Embeddings. |
| "Find customers who behave like this one (same purchase patterns, same support cases)" | Graph Embeddings. |
| "Find substitute products based on who buys them" | Graph Embeddings. |
| "Find connected components / explicit graph paths" | Regular IQuery traversal (StartAt/Out/In) — not embeddings. |
The vectors are sensitive to graph density. Sparse subgraphs (nodes with very few edges) produce poor embeddings; you want every node to participate in at least a handful of meaningful relationships.
Registering and configuring
var settings = new SettingsHolder();
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.Dimensions), "128");
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.Epoch), "50");
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.LearningRate), "0.05");
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.NegativeSamplingCount), "10");
var index = await Graph.Indexes.AddPageSpaceEmbeddingsIndexAsync(
nodeType: N.Customer.Type,
setting: settings);
// Define the walk topology — which edges to traverse and which node types to land on.
index.SetNodesAndEdges(
edgesToFollow: new[] { E.Placed, E.Contains, E.OpenedCase, E.About },
nodesToFollow: new[] { N.Customer.Type, N.Order.Type, N.Product.Type, N.SupportCase.Type });
await index.TrainAsync(progressLogger: msg => Logger.LogInformation(msg));
Options reference
PageSpaceEmbeddingsIndexData (the option bag) and PageSpaceData (training-time settings on the underlying model) expose:
| Option | Type | Default | Effect |
|---|---|---|---|
Dimensions |
int | 128 | Vector dimensionality. Higher = more capacity, more memory. |
Epoch |
int | 50 | Training passes over the walks. |
LearningRate |
float | 0.05 | SGD step size. |
NegativeSamplingCount |
int | 10 | Negative samples per positive pair. |
Threads |
int | Environment.ProcessorCount |
Worker threads during training. |
EdgesToFollow |
string[] | none | Edge types the random walks may traverse. Required. |
NodesToFollow |
string[] | none | Node types eligible for walks. Required. |
UseSimilarityForTokens |
bool | true | When trained with _Token nodes, biases negative sampling toward dissimilar tokens. |
Buckets (PageSpaceData.Buckets) |
uint | 2,000,000 | Subword/token bucket size. |
The index is not auto-trained on commit — call TrainAsync once a meaningful amount of data is in the graph, then again periodically (e.g. as a scheduled task) as the graph shifts.
Reading the vectors
PageSpace embeddings plug into the same IQuery surface as the other indexes:
// Find 20 customers structurally similar to a seed customer.
var index = Graph.Indexes
.OfType<PageSpaceEmbeddingsIndex>(N.Customer.Type)
.First();
var similar = Q().StartAt(customerUID)
.Similar(IndexTypes.PageSpaceEmbeddingsIndex, index.UID, count: 20)
.EmitWithScores();
For mixed scenarios — "structurally similar and in the same region", or "similar plus a soft boost from a text signal" — use the Similarity Engine with one signal that calls Similar(IndexTypes.PageSpaceEmbeddingsIndex, …) and a second one that does whatever else you need.
See also
- Sentence Embeddings — text-driven embeddings.
- Raw Embeddings — bring your own vectors.
- IQuery Similarity Search —
Similar()and friends. - Similarity Engine — combining graph embeddings with other signals.
- Indexes overview — how indexing is queued and run.