Similarity Engine — Finding Similar Objects

The similarity engine builds a recommendation-style ranking by combining multiple signals (each producing candidate UIDs from a different source — text embeddings, graph traversals, external lookups), fusing them into a single ranked list, and applying optional rules to filter or boost the result.

Use it from a custom endpoint when you want answers like "find products similar to X" or "show cases like this one" that can't be expressed as a single search query.

The engine lives in the Mosaik.GraphDB.Similarity namespace and is reached via IQuery.ToSimilarity(...).

Anatomy of a scenario

Graph.Query()
   .StartAt(seedUID)        ← the "subject(s)" the signals see in ctx.Subjects
   .ToSimilarity(opts)
   .AddSignal("name1", s => ...)   ← 1..N candidate sources (text, graph, external)
   .AddSignal("name2", s => ...)
   .Fuse(f => f.UsingReciprocalRankFusion(...))   ← how to combine signals into one ranking
   .AddRule("name", r => r.Filter(...) | .BoostByRank(...) | .TransformFusedScore(...))
   .Filter(uid => ...)                          ← optional final UID-level filter
   .ExecuteAsync(ct);

The result is a SimilarityResult carrying the final scores plus per-signal breakdowns (useful for explanations and debugging).

A text + graph similarity endpoint

The example below answers "find products similar to this one" using:

A text-similarity signal over indexed product names.
A same-manufacturer signal that boosts products from the same manufacturer.
A shared-tag signal that boosts products sharing tags with the seed.
A Reciprocal Rank Fusion to combine the three rankings.
A filter rule that drops candidates outside the requested category.
A boost-by-rank rule that lifts products the current user has previously purchased.

//ImportEndpoint("_lib/product-helpers")   // ToDto, GetSimilarityIndex, …

public record SimilarRequest(
    string ProductId,
    int    TopK = 10,
    string Category = null,
    bool   Explain = false);

public record ScoredProduct(
    string Id, string Name, double Score,
    IReadOnlyDictionary<string, double> SignalScores);

var input = Body.FromJson<SimilarRequest>();
if (string.IsNullOrWhiteSpace(input.ProductId))
    return BadRequest("`productId` is required.");

if (!Graph.TryGetReadOnlyContent<Product>(N.Product.Type, input.ProductId, out var seedNode))
    return NotFound($"Product '{input.ProductId}' not found.");

var seedUID            = seedNode.UID;
var seedName           = seedNode.Get<string>(N.Product.Name) ?? "";
var seedManufacturer   = Q().StartAt(seedUID).Out(N.Manufacturer.Type, E.ManufacturedBy)
                            .AsUIDEnumerable().FirstOrDefault();
var productTextIndex   = ProductHelpers.GetSimilarityIndex(Graph);

var result = await Graph.Query()
    .StartAt(seedUID)
    .ToSimilarity(o => o
        .MaxCandidates(200)
        .MaxCandidatesPerSignal(100)
        .EnableExplanations(input.Explain)
        .TrackProgress(async p => await RelayStatusAsync(p.Message)))

    // 1. Text similarity over product names (embedding index).
    .AddSignal("SimilarName", s => s
        .Describe("Name embedding similarity")
        .Weight(1.0f)
        .Limit(100)
        .FromAsync(async ctx =>
            (await ctx.Graph.Query()
                .StartAtSimilarTextAsync(
                    seedName, count: 100,
                    nodeTypes: new[] { N.Product.Type },
                    indexUID:  productTextIndex,
                    applyCutoff: false))
            .Except(ctx.Subjects)         // never recommend the seed itself
            .AsUIDEnumerable()))

    // 2. Same manufacturer — products N hops away on the graph.
    .AddSignal("SameManufacturer", s => s
        .Describe("Other products from the same manufacturer")
        .Weight(0.7f)
        .From(ctx => ctx.Graph.Query()
            .StartAt(ctx.Subjects)
            .Out(N.Manufacturer.Type, E.ManufacturedBy)
            .Out(N.Product.Type,      E.Manufactures)
            .Except(ctx.Graph.Query().StartAt(ctx.Subjects))
            .AsUIDEnumerable()))

    // 3. Tag overlap — shared category/tag nodes.
    .AddSignal("SharedTags", s => s
        .Describe("Products that share tags with the seed")
        .Weight(0.5f)
        .From(ctx => ctx.Graph.Query()
            .StartAt(ctx.Subjects)
            .Out(N.Tag.Type, E.HasTag)
            .Out(N.Product.Type, E.HasTag)
            .Except(ctx.Graph.Query().StartAt(ctx.Subjects))
            .AsUIDEnumerable()))

    // Fuse the three rankings with Reciprocal Rank Fusion.
    // Each signal's Weight (above) scales its contribution to the fused rank.
    .Fuse(f => f.UsingReciprocalRankFusion(o => o
        .RankOffset(60)))

    // Hard filter — keep only the requested category.
    .AddRule("CategoryFilter", r =>
    {
        r.Enabled(!string.IsNullOrEmpty(input.Category));
        r.Filter((ctx, candidates) => ctx.Graph.Query()
            .StartAt(candidates)
            .IsRelatedTo(Node.GetUID(N.Category.Type, input.Category))
            .AsUIDEnumerable());
    })

    // Soft boost — products the current user has bought before.
    .AddRule("BoostPurchases", r => r
        .BoostByRank(
            ctx => ctx.Graph.Query()
                .StartAt(CurrentUser)
                .Out(N.Order.Type,   E.Placed)
                .Out(N.Product.Type, E.Contains)
                .AsUIDEnumerable(),
            weight: 0.3f,
            topK:   50,
            contributesAs: "PreviouslyPurchased"))

    .ExecuteAsync(CancellationToken);

The result is mapped to the response JSON:

var hits = result.Scores
    .OrderByDescending(kv => kv.Value)
    .Take(input.TopK)
    .Select(kv =>
    {
        Graph.TryGetReadOnlyContent<Product>(kv.Key, out var node);

        var perSignal = result.Signals
            .Where(s => s.Value.ContainsKey(kv.Key))
            .ToDictionary(s => s.Key, s => (double)s.Value[kv.Key]);

        return new ScoredProduct(
            Id:    node?.GetKey() ?? kv.Key.ToString(),
            Name:  node?.Get<string>(N.Product.Name) ?? "",
            Score: kv.Value,
            SignalScores: perSignal);
    })
    .ToList();

return Ok(new { source = input.ProductId, hits }.ToJson(), "application/json");

A response with explain = true looks like:

{
  "source": "P-2199",
  "hits": [
    {
      "id":    "P-2207",
      "name":  "Wireless Mouse Pro",
      "score": 0.0721,
      "signalScores": {
        "SimilarName":       0.0244,
        "SameManufacturer":  0.0167,
        "SharedTags":        0.0083
      }
    }
  ]
}

Signal sources

A signal's From(...) / FromAsync(...) callback receives a SignalContext and returns either IEnumerable<UID128> (rank-only) or IEnumerable<ScoredUID> (carry a score the signal pre-computed).

Source	How
Text embeddings	`ctx.Graph.Query().StartAtSimilarTextAsync(text, count, nodeTypes, indexUID, applyCutoff)`
Graph traversal	Standard `IQuery` chain — `StartAt(ctx.Subjects).Out(...).Where(...)`
External lookup	Any async source — call out, then return UIDs in the workspace
Pre-scored hits	Return `IEnumerable<ScoredUID>` to feed your own scores in

SignalContext exposes:

UID128[] Subjects — the UIDs from the scenario's StartAt(...).
Graph Graph — the live graph (admin-level). Use ctx.Graph.Query(userUID) if you need ACL filtering inside the signal.

Fusion engines

Engine	Use when
`UsingReciprocalRankFusion(o => ...)`	Default for combining multiple rankings. Robust against signals whose raw score scales differ — only each signal's rank is used. Sums reciprocal ranks across signals, so consensus across signals scores higher. Set `RankOffset(k)` (RRF's `k`, default 60); optionally `Normalize(...)` and `MaxRankConsidered(n)`.
`UsingMaxScore()`	Take the max score per candidate across signals. Good when one signal is the source of truth and the others are tiebreakers.
`UsingMergedScores((a, b) => f(a, b))`	Custom pairwise merge. Default is √(a² + b²) — a soft "OR" of scores.

How a signal's Weight(...) is applied depends on the engine:

Rank fusion (UsingReciprocalRankFusion) ignores raw score magnitudes, so the weight scales the signal's reciprocal-rank contribution directly (weight / (k + rank)).
Score fusion (UsingMaxScore, UsingMergedScores) compares the raw per-candidate scores, which already carry the weight (each signal multiplies its scores by its weight).

With a single signal, the scenario uses the signal's scores directly and no Fuse(...) is needed. With two or more signals, you must call Fuse(...) or the scenario falls back to MaxScore (still works, but rarely what you want).

Rules

Rules run after fusion. Use them to clip or reweight the result.

Rule	Purpose
`r.Filter((ctx, cands) => keepEnumerable)`	Hard filter — only the returned UIDs survive.
`r.BoostByRank(ctx => rankedUIDs, weight, topK, contributesAs)`	Soft boost — adds `weight / rank` (RRF-style) to candidates that appear in the ranked list.
`r.TransformFusedScore((ctx, uid, score) => newScore)`	Arbitrary post-processing — useful for time-decay penalties, normalization, hard score floors.

A rule can be conditionally disabled with r.Enabled(false) — useful when a request parameter decides whether to apply it.

Options

ToSimilarity(o => ...) accepts a SimilarityOptionsBuilder:

Option	Effect
`MaxCandidates(n)`	Cap the fused pool to its top-N highest-scoring candidates after fusion, before rules run. `0` (default) means no cap.
`MaxCandidatesPerSignal(n)`	Trim each signal to its top-N highest-scoring candidates before fusion. `0` (default) means no cap; the per-signal `Limit(...)` still applies.
`EnableExplanations(true)`	Populate `result.Rules` with the score map after each rule, for debugging.
`TrackTimings(true)`	Populate `result.Timings` with per-signal, per-rule, fusion and total wall-clock timings.
`TrackProgress(async p => …)`	Stream `SimilarityEngineProgress` events (`p.Stage`, `p.Name`, `p.Message`, `p.Step`/`p.TotalSteps`) as the scenario advances. Wire to `RelayStatusAsync` to surface them to the caller.

`SimilarityResult`

public class SimilarityResult
{
    public UID128 Source;                                                       // First subject
    public Dictionary<UID128, float> Scores;                                    // Final fused + ruled
    public Dictionary<string, Dictionary<UID128, float>> Signals;               // Per-signal raw
    public Dictionary<string, Dictionary<UID128, float>> Rules;                 // When Explanations on
    public Dictionary<string, TimeSpan> Timings;                                // When TrackTimings on
}

Scores is what you turn into the page-1 result list. Signals, Rules and Timings are diagnostic — surface them when explain = true so consumers can see why an item ranked where it did and where time was spent.

Access control

Signals run against the admin-level graph (ctx.Graph), so result.Scores can contain UIDs the calling user is not allowed to see. The engine does not enforce per-user access on results — that is left to the consumer. You have two options:

Scope inside a signal: ctx.Graph.Query(userUID) instead of ctx.Graph.Query().
Filter the final result: add .FilterAsUser(userUID) to the scenario, which drops any result the given user cannot access before ExecuteAsync returns.

var result = await Graph.Query()
    .StartAt(seedUID)
    .ToSimilarity()
    .AddSignal("SimilarName", s => s.FromAsync(/* … */))
    .FilterAsUser(CurrentUser)     // results limited to what CurrentUser may see
    .ExecuteAsync(CancellationToken);

When to reach for the similarity engine

Need	Use
"Find products like this one" with mixed text + graph signals	Similarity engine.
Free-text search with facets	`CreateSearchAndFacetsAsUserAsync` — see Searching from Endpoints.
Pure semantic retrieval against an embedding index	`await Q().StartAtSimilarTextAsync(...)` — see IQuery Similarity Search.
Recommendation from a saved candidate list	Skip the signals and use rules-only.

Cross-links

IQuery Similarity Search — the single-index lookups (Similar, StartAtSimilarTextAsync) that feed signals.
Searching from Endpoints — when a single search request is enough.
Sentence Embeddings — how the embedding index that powers the text-similarity signal is built.
AI Search — operator-level configuration of the same indexes.
Auto-generated Helpers — N.*, E.*, and Endpoints.* constants used in signal queries.
Graph Query Language — IQuery chains used inside signals and rules.
Clustering & Visualization — turn the SimilarityResult into a graph the user can explore.