Graph Embeddings (PageSpace)

The PageSpaceEmbeddingsIndex produces vectors from the graph's topology, not from text. Two nodes connected by similar edge patterns — same neighbors, same paths — end up close in vector space. It's how the workspace answers questions like "which customers behave like this one?" or "which products are functionally substitutable?" without depending on field text.

The implementation lives in Mosaik.GraphDB.Indexes.PageSpaceEmbeddingsIndex, backed by the Mosaik.GraphDB.Models.PageSpace self-supervised model. Both implement the same ISimilarityIndex surface as the other embedding indexes, so consumers don't care which kind of vector they're querying.

How it works

PageSpace is a StarSpace-style embedding model trained over random walks through the graph:

flowchart LR G[(Graph)] --> Walks[Random walks<br/>EdgesToFollow] Walks --> Train[StarSpace-style training<br/>negative sampling] Train --> Vecs[Vector per node] Vecs --> HNSW[(HNSW searcher)] New[(New node committed)] --> Predict[pageSpace.TryGetVector] Predict --> HNSW

The model learns one vector per node by sampling positive examples (nodes that co-occur on a walk) and negative samples (random nodes), and adjusting both vectors of each positive pair to be more similar while pushing negatives apart. You configure which edges to follow and which node types are eligible — those choices define what "similar" means for your domain.

After training, new nodes get vectors predicted in real time as they're committed (PageSpace.TryGetVector), without retraining the whole model.

When to use it

Question	Right tool
"Find products with similar names / descriptions"	Sentence Embeddings.
"Find customers who behave like this one (same purchase patterns, same support cases)"	Graph Embeddings.
"Find substitute products based on who buys them"	Graph Embeddings.
"Find connected components / explicit graph paths"	Regular `IQuery` traversal (`StartAt`/`Out`/`In`) — not embeddings.

The vectors are sensitive to graph density. Sparse subgraphs (nodes with very few edges) produce poor embeddings; you want every node to participate in at least a handful of meaningful relationships.

Registering and configuring

var settings = new SettingsHolder();
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.Dimensions),            "128");
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.Epoch),                 "50");
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.LearningRate),          "0.05");
settings.ManuallySet(nameof(PageSpaceEmbeddingsIndexData.NegativeSamplingCount), "10");

var index = await Graph.Indexes.AddPageSpaceEmbeddingsIndexAsync(
    nodeType: N.Customer.Type,
    setting:  settings);

// Define the walk topology — which edges to traverse and which node types to land on.
index.SetNodesAndEdges(
    edgesToFollow: new[] { E.Placed, E.Contains, E.OpenedCase, E.About },
    nodesToFollow: new[] { N.Customer.Type, N.Order.Type, N.Product.Type, N.SupportCase.Type });

await index.TrainAsync(progressLogger: msg => Logger.LogInformation(msg));

Options reference

PageSpaceEmbeddingsIndexData (the option bag) and PageSpaceData (training-time settings on the underlying model) expose:

Option	Type	Default	Effect
`Dimensions`	int	128	Vector dimensionality. Higher = more capacity, more memory.
`Epoch`	int	50	Training passes over the walks.
`LearningRate`	float	0.05	SGD step size.
`NegativeSamplingCount`	int	10	Negative samples per positive pair.
`Threads`	int	`Environment.ProcessorCount`	Worker threads during training.
`EdgesToFollow`	string[]	none	Edge types the random walks may traverse. Required.
`NodesToFollow`	string[]	none	Node types eligible for walks. Required.
`UseSimilarityForTokens`	bool	true	When trained with `_Token` nodes, biases negative sampling toward dissimilar tokens.
`Buckets` (`PageSpaceData.Buckets`)	uint	2,000,000	Subword/token bucket size.

The index is not auto-trained on commit — call TrainAsync once a meaningful amount of data is in the graph, then again periodically (e.g. as a scheduled task) as the graph shifts.

Reading the vectors

PageSpace embeddings plug into the same IQuery surface as the other indexes:

// Find 20 customers structurally similar to a seed customer.
var index = Graph.Indexes
    .OfType<PageSpaceEmbeddingsIndex>(N.Customer.Type)
    .First();

var similar = Q().StartAt(customerUID)
                 .Similar(IndexTypes.PageSpaceEmbeddingsIndex, index.UID, count: 20)
                 .EmitWithScores();

For mixed scenarios — "structurally similar and in the same region", or "similar plus a soft boost from a text signal" — use the Similarity Engine with one signal that calls Similar(IndexTypes.PageSpaceEmbeddingsIndex, …) and a second one that does whatever else you need.