Curiosity

Clustering Similarity Results & Visualizing as a Graph

A single similarity lookup gives you the neighbors of one seed. The interesting product features start when you compute that for many seeds and treat the resulting scored pairs as a weighted graph: nodes that pull each other closer end up in the same cluster. From there a force-directed view makes the structure visible to users.

This page walks through:

  1. Collecting (source, target, score) triples from many similarity searches.
  2. Feeding them into WeightedGraph<T>.Cluster(...) to extract groups.
  3. Rendering the result with ForceGraphView.

The algorithm lives in Mosaik.Core.Algorithms (WeightedGraph<T>, Node<T>, WeightedEdge<T>, Cluster<T>). The UI component is Mosaik.Components.ForceGraphView.


1. Compute similarity edges for many seeds

Building on the /similar-products endpoint from IQuery Similarity Search, expand the input from one UID to a set of seed UIDs and collect the neighbors of each:

public record SimilarityGraphRequest(
    string[] ProductIds,
    int      PerSeedK    = 10,
    float    MinScore    = 0.5f);

public record GraphResponse(
    IReadOnlyList<GraphNodeDto> Nodes,
    IReadOnlyList<GraphEdgeDto> Edges);

public record GraphNodeDto(string Id, string Label, int Cluster);
public record GraphEdgeDto(string Source, string Target, double Weight);

var input = Body.FromJson<SimilarityGraphRequest>();

if (input?.ProductIds is null || input.ProductIds.Length == 0)
    return BadRequest("`productIds` is required.");

// Resolve seed UIDs.
var seedUIDs = input.ProductIds
    .Select(id => Graph.TryGetReadOnlyContent<Product>(N.Product.Type, id, out var n) ? n.UID : default)
    .Where(uid => uid != default)
    .Distinct()
    .ToArray();

if (seedUIDs.Length == 0)
    return NotFound("No matching products found.");

// Pick the index we want to drive similarity from.
var nameIndex = Graph.Indexes
    .OfType<SentenceEmbeddingsIndex>(N.Product.Type)
    .First(i => i.FieldName == N.Product.Name);

// Run a similarity query per seed and collect scored pairs.
var pairs = new List<(UID128 src, UID128 tgt, float score)>();

foreach (var seedUID in seedUIDs)
{
    var neighbors = Q().StartAt(seedUID)
                       .Similar(indexUID: nameIndex.UID, count: input.PerSeedK + 1)
                       .AsScoredUIDEnumerable();

    foreach (var s in neighbors)
    {
        if (s.UID.UID == seedUID)        continue;          // skip self
        if (s.Score   < input.MinScore)  continue;          // weak edges hurt clustering
        pairs.Add((seedUID, s.UID.UID, s.Score));
    }
}

AsScoredUIDEnumerable() is the form of IQuery materialization that preserves similarity scores — see Querying the Graph.


2. Build a WeightedGraph and extract clusters

WeightedGraph<T>.Cluster takes a node list and an edge list and returns a list of Cluster<T> instances. Each cluster enumerates the nodes that belong to it; every node also carries its assigned cluster id (Node<T>.Cluster):

// Distinct UID set — includes seeds plus everything they pulled in.
var allUIDs = pairs
    .SelectMany(p => new[] { p.src, p.tgt })
    .Distinct()
    .ToList();

// Wrap each UID in a graph node.
var nodes = allUIDs.Select(u => new Mosaik.Core.Algorithms.Node<UID128>(u)).ToList();

// Convert pairs to weighted edges.
var lookup = nodes.ToDictionary(n => n.Value);
var edges  = pairs.Select(p =>
    new Mosaik.Core.Algorithms.WeightedEdge<UID128>(
        source: lookup[p.src],
        target: lookup[p.tgt],
        weight: p.score))
    .ToList();

// Run hierarchical clustering — returns one Cluster<UID128> per top-level group.
// Side-effect: each node's `.Cluster` field is set to its cluster index.
var clusters = Mosaik.Core.Algorithms.WeightedGraph<UID128>.Cluster(nodes, edges);

WeightedGraph<T>.Cluster does a hierarchical edge-cut: it iteratively splits the graph by removing the weakest edges and falls back to a per-component flat clustering. The result is robust to disconnected sub-graphs (each one becomes its own top-level cluster).

After the call:

  • clusters[i] enumerates the nodes in cluster i.
  • node.Cluster is the cluster index assigned to each node — convenient when you've stored the Node<UID128> list alongside other per-node data.

3. Shape the response for a force-directed view

ForceGraphView consumes GraphExplorerNode[] (drawn nodes) and GraphExplorerEdge[] (drawn links). Map clustered UIDs into both shapes, using the cluster id as the color key:

// Pick a colour per cluster (cycle through a palette).
string[] palette = { "#4F8BF9", "#F97171", "#7FD17F", "#F9C74F", "#9B59B6", "#26C6DA" };

string ColorFor(int cluster) => palette[((cluster % palette.Length) + palette.Length) % palette.Length];

var nodeDtos = nodes.Select(n =>
{
    Graph.TryGetReadOnlyContent<Product>(n.Value, out var p);
    return new GraphNodeDto(
        Id:       n.Value.ToString(),
        Label:    p?.Name ?? n.Value.ToString(),
        Cluster:  n.Cluster);
}).ToList();

var edgeDtos = pairs.Select(p =>
    new GraphEdgeDto(p.src.ToString(), p.tgt.ToString(), p.score)
).ToList();

return Ok(new GraphResponse(nodeDtos, edgeDtos).ToJson(), "application/json");

The endpoint's JSON output is everything a front-end view needs: each node has an id, a label, and a cluster (used to colour the bubble); each edge has source/target/weight.


4. Render with ForceGraphView

ForceGraphView lives in the workspace front-end project (Mosaik.Components) and renders a D3 force-directed canvas. The component already powers the workspace's built-in graph explorer; you can drop it into a custom front-end view without re-implementing the layout.

A minimal page that fetches the JSON above and renders it:

public class ProductSimilarityView : IComponent
{
    private readonly ForceGraphView _view = new ForceGraphView(
        enableInteraction: true,
        isEmbeddedView:    false);

    public ProductSimilarityView(string[] seedIds)
    {
        _ = LoadAsync(seedIds);
    }

    private async Task LoadAsync(string[] seedIds)
    {
        var response = await Endpoints.SimilarityGraph(new
        {
            productIds = seedIds,
            perSeedK   = 10,
            minScore   = 0.5f
        });

        // Map server JSON → ForceGraphView inputs.
        var nodes = response.Nodes.Select(n => new GraphExplorerNode
        {
            id        = n.Id,
            Label     = n.Label,
            ShortLabel= n.Label,
            NodeType  = N.Product.Type,
            Color     = ColorFor(n.Cluster),
            radius    = 12
        }).ToArray();

        var edges = response.Edges.Select(e => new GraphExplorerEdge
        {
            UID          = $"{e.Source}->{e.Target}",
            SourceUID    = e.Source,
            TargetUID    = e.Target,
            EdgeTypeName = "Similar"
        }).ToArray();

        _view.SetData(nodes, edges);
    }

    public HTMLElement Render() => _view.Render();

    private static string ColorFor(int c)
    {
        string[] palette = { "#4F8BF9", "#F97171", "#7FD17F", "#F9C74F", "#9B59B6", "#26C6DA" };
        return palette[((c % palette.Length) + palette.Length) % palette.Length];
    }
}

Key points when wiring data into ForceGraphView:

  • GraphExplorerNode.id must be unique per node — using the UID string from the endpoint response is the natural choice.
  • GraphExplorerEdge.SourceUID / TargetUID must match the id of a node already passed to SetData — orphan edges are dropped.
  • Color is what produces the per-cluster visual grouping. Cluster index → palette colour is the simplest mapping; for many clusters use an HSL ramp.
  • radius drives node size; combine with EdgeCount if you want hubs to read bigger.
  • SetData replaces the current data. Call AddData to layer new nodes onto an existing render (e.g. when the user expands a cluster).

The component also exposes OnNodeClick, OnNodeSelect, OnNodeContextClick, etc. — wire those to push details into a side panel when the user explores the graph:

_view.OnNodeClick((sender, evt) =>
{
    var clicked = evt.Data;
    OpenDetailsFor(clicked.id);
});

Built-in physics applies "same-colour attracts" via the forceCluster JS hook the component installs at construction, so cluster nodes naturally settle into separated bubbles without extra work.


End-to-end shape

flowchart LR Seeds[Seed UIDs] --> Sim[Q().Similar per seed] Sim --> Pairs[Scored pairs<br/>src, tgt, score] Pairs --> WG[WeightedGraph.Cluster] WG --> Clusters[Clusters + per-node Cluster id] Clusters --> Json[Endpoint JSON<br/>nodes + edges + cluster] Json --> FGV[ForceGraphView<br/>colour by cluster]

Each stage is independent: you can drive the clustering from any embedding index (just swap nameIndex.UID for a PageSpaceEmbeddingsIndex or RawEmbeddingsIndex UID), and you can render the result anywhere ForceGraphView is available without changing the endpoint.


See also

© 2026 Curiosity. All rights reserved.