Similarity Engine — Finding Similar Objects

The similarity engine builds a recommendation-style ranking by combining multiple signals (each producing candidate UIDs from a different source — text embeddings, graph traversals, external lookups), fusing them into a single ranked list, and applying optional rules to filter or boost the result.

Use it from a custom endpoint when you want answers like "find products similar to X" or "show cases like this one" that can't be expressed as a single search query.

The engine lives in the Mosaik.GraphDB.Similarity namespace and is reached via IQuery.ToSimilarity(...).

Anatomy of a scenario

Graph.Query()
   .StartAt(seedUID)        ← the "subject(s)" the signals see in ctx.Subjects
   .ToSimilarity(opts)
   .AddSignal("name1", s => ...)   ← 1..N candidate sources (text, graph, external)
   .AddSignal("name2", s => ...)
   .Fuse(f => f.UsingMeanReciprocalRank(...))   ← how to combine signals into one ranking
   .AddRule("name", r => r.Filter(...) | .BoostByRank(...) | .TransformFusedScore(...))
   .Filter(uid => ...)                          ← optional final UID-level filter
   .ExecuteAsync(ct);

The result is a SimilarityResult carrying the final scores plus per-signal breakdowns (useful for explanations and debugging).

A text + graph similarity endpoint

The example below answers "find products similar to this one" using:

A text-similarity signal over indexed product names.
A same-manufacturer signal that boosts products from the same manufacturer.
A shared-tag signal that boosts products sharing tags with the seed.
An MRR fusion to combine the three rankings.
A filter rule that drops candidates outside the requested category.
A boost-by-rank rule that lifts products the current user has previously purchased.

//ImportEndpoint("_lib/product-helpers")   // ToDto, GetSimilarityIndex, …

public record SimilarRequest(
    string ProductId,
    int    TopK = 10,
    string Category = null,
    bool   Explain = false);

public record ScoredProduct(
    string Id, string Name, double Score,
    IReadOnlyDictionary<string, double> SignalScores);

var input = Body.FromJson<SimilarRequest>();
if (string.IsNullOrWhiteSpace(input.ProductId))
    return BadRequest("`productId` is required.");

if (!Graph.TryGetReadOnlyContent<Product>(N.Product.Type, input.ProductId, out var seedNode))
    return NotFound($"Product '{input.ProductId}' not found.");

var seedUID            = seedNode.UID;
var seedName           = seedNode.Get<string>(N.Product.Name) ?? "";
var seedManufacturer   = Q().StartAt(seedUID).Out(N.Manufacturer.Type, E.ManufacturedBy)
                            .AsUIDEnumerable().FirstOrDefault();
var productTextIndex   = ProductHelpers.GetSimilarityIndex(Graph);

var result = await Graph.Query()
    .StartAt(seedUID)
    .ToSimilarity(o => o
        .MaxCandidates(200)
        .MaxCandidatesPerSignal(100)
        .EnableExplanations(input.Explain)
        .TrackProgress(async p => await RelayStatusAsync(p.Message)))

    // 1. Text similarity over product names (embedding index).
    .AddSignal("SimilarName", s => s
        .Describe("Name embedding similarity")
        .Weight(1.0f)
        .Limit(100)
        .FromAsync(async ctx =>
            (await ctx.Graph.Query()
                .StartAtSimilarTextAsync(
                    seedName, count: 100,
                    nodeTypes: new[] { N.Product.Type },
                    indexUID:  productTextIndex,
                    reverseSorting: false))
            .Except(ctx.Subjects)         // never recommend the seed itself
            .AsUIDEnumerable()))

    // 2. Same manufacturer — products N hops away on the graph.
    .AddSignal("SameManufacturer", s => s
        .Describe("Other products from the same manufacturer")
        .Weight(0.7f)
        .From(ctx => ctx.Graph.Query()
            .StartAt(ctx.Subjects)
            .Out(N.Manufacturer.Type, E.ManufacturedBy)
            .Out(N.Product.Type,      E.Manufactures)
            .Except(ctx.Graph.Query().StartAt(ctx.Subjects))
            .AsUIDEnumerable()))

    // 3. Tag overlap — shared category/tag nodes.
    .AddSignal("SharedTags", s => s
        .Describe("Products that share tags with the seed")
        .Weight(0.5f)
        .From(ctx => ctx.Graph.Query()
            .StartAt(ctx.Subjects)
            .Out(N.Tag.Type, E.HasTag)
            .In(N.Product.Type, E.HasTag)
            .Except(ctx.Graph.Query().StartAt(ctx.Subjects))
            .AsUIDEnumerable()))

    // Fuse the three rankings with Reciprocal Rank Fusion.
    .Fuse(f => f.UsingMeanReciprocalRank(o => o
        .RankOffset(60)
        .AverageOverSignalsWherePresent()))

    // Hard filter — keep only the requested category.
    .AddRule("CategoryFilter", r =>
    {
        r.Enabled(!string.IsNullOrEmpty(input.Category));
        r.Filter((ctx, candidates) => ctx.Graph.Query()
            .StartAt(candidates)
            .IsRelatedTo(N.Category.Type, input.Category)
            .AsUIDEnumerable());
    })

    // Soft boost — products the current user has bought before.
    .AddRule("BoostPurchases", r => r
        .BoostByRank(
            ctx => ctx.Graph.Query()
                .StartAt(CurrentUser)
                .Out(N.Order.Type,   E.Placed)
                .Out(N.Product.Type, E.Contains)
                .AsUIDEnumerable(),
            weight: 0.3f,
            topK:   50,
            contributesAs: "PreviouslyPurchased"))

    .ExecuteAsync(CancellationToken);

The result is mapped to the response JSON:

var hits = result.Scores
    .OrderByDescending(kv => kv.Value)
    .Take(input.TopK)
    .Select(kv =>
    {
        Graph.TryGetReadOnlyContent<Product>(kv.Key, out var node);

        var perSignal = result.Signals
            .Where(s => s.Value.ContainsKey(kv.Key))
            .ToDictionary(s => s.Key, s => (double)s.Value[kv.Key]);

        return new ScoredProduct(
            Id:    node?.GetKey() ?? kv.Key.ToString(),
            Name:  node?.Get<string>(N.Product.Name) ?? "",
            Score: kv.Value,
            SignalScores: perSignal);
    })
    .ToList();

return Ok(new { source = input.ProductId, hits }.ToJson(), "application/json");

A response with explain = true looks like:

{
  "source": "P-2199",
  "hits": [
    {
      "id":    "P-2207",
      "name":  "Wireless Mouse Pro",
      "score": 0.0721,
      "signalScores": {
        "SimilarName":       0.0244,
        "SameManufacturer":  0.0167,
        "SharedTags":        0.0083
      }
    }
  ]
}

Signal sources

A signal's From(...) / FromAsync(...) callback receives a SignalContext and returns either IEnumerable<UID128> (rank-only) or IEnumerable<ScoredUID> (carry a score the signal pre-computed).

Source	How
Text embeddings	`ctx.Graph.Query().StartAtSimilarTextAsync(text, count, nodeTypes, indexUID, reverseSorting)`
Graph traversal	Standard `IQuery` chain — `StartAt(ctx.Subjects).Out(...).In(...).Where(...)`
External lookup	Any async source — call out, then return UIDs in the workspace
Pre-scored hits	Return `IEnumerable<ScoredUID>` to feed your own scores in

SignalContext exposes:

UID128[] Subjects — the UIDs from the scenario's StartAt(...).
Graph Graph — the live graph (admin-level). Use ctx.Graph.Query(userUID) if you need ACL filtering inside the signal.

Fusion engines

Engine	Use when
`UsingMeanReciprocalRank(o => ...)`	Default for combining multiple rankings. Robust against signals whose raw score scales differ. Set `RankOffset(k)` (RRF's `k`, default 60).
`UsingMaxScore()`	Take the max score per candidate across signals. Good when one signal is the source of truth and the others are tiebreakers.
`UsingMergedScores((a, b) => f(a, b))`	Custom pairwise merge. Default is √(a² + b²) — a soft "OR" of scores.

With a single signal, the scenario uses the signal's scores directly and no Fuse(...) is needed. With two or more signals, you must call Fuse(...) or the scenario falls back to MaxScore (still works, but rarely what you want).

Rules

Rules run after fusion. Use them to clip or reweight the result.

Rule	Purpose
`r.Filter((ctx, cands) => keepEnumerable)`	Hard filter — only the returned UIDs survive.
`r.BoostByRank(ctx => rankedUIDs, weight, topK, contributesAs)`	Soft boost — adds `weight / rank` (RRF-style) to candidates that appear in the ranked list.
`r.TransformFusedScore((ctx, uid, score) => newScore)`	Arbitrary post-processing — useful for time-decay penalties, normalization, hard score floors.

A rule can be conditionally disabled with r.Enabled(false) — useful when a request parameter decides whether to apply it.

Options

ToSimilarity(o => ...) accepts a SimilarityOptionsBuilder:

Option	Effect
`MaxCandidates(n)`	Hard cap on the candidate pool size across all signals.
`MaxCandidatesPerSignal(n)`	Trim each signal to its top-N before fusion.
`EnableExplanations(true)`	Populate `result.Rules` with per-rule contributions for debugging.
`TrackTimings(true)`	Per-signal and per-rule timings on the result.
`TrackProgress(async p => …)`	Stream progress events. Wire to `RelayStatusAsync` to surface them to the caller.

`SimilarityResult`

public class SimilarityResult
{
    public UID128 Source;                                                       // First subject
    public Dictionary<UID128, float> Scores;                                    // Final fused + ruled
    public Dictionary<string, Dictionary<UID128, float>> Signals;               // Per-signal raw
    public Dictionary<string, Dictionary<UID128, float>> Rules;                 // When Explanations on
}

Scores is what you turn into the page-1 result list. Signals and Rules are diagnostic — surface them when explain = true so consumers can see why an item ranked where it did.

When to reach for the similarity engine

Need	Use
"Find products like this one" with mixed text + graph signals	Similarity engine.
Free-text search with facets	`CreateSearchAndFacetsAsUserAsync` — see Searching from Endpoints.
Pure semantic retrieval against an embedding index	`Q().StartAtSimilarText(...)` — see IQuery Similarity Search.
Recommendation from a saved candidate list	Skip the signals and use rules-only.

Cross-links

IQuery Similarity Search — the single-index lookups (Similar, StartAtSimilarTextAsync) that feed signals.
Searching from Endpoints — when a single search request is enough.
Sentence Embeddings — how the embedding index that powers the text-similarity signal is built.
AI Search — operator-level configuration of the same indexes.
Auto-generated Helpers — N.*, E.*, and Endpoints.* constants used in signal queries.
Graph Query Language — IQuery chains used inside signals and rules.
Clustering & Visualization — turn the SimilarityResult into a graph the user can explore.