Similarity Engine — Finding Similar Objects
The similarity engine builds a recommendation-style ranking by combining multiple signals (each producing candidate UIDs from a different source — text embeddings, graph traversals, external lookups), fusing them into a single ranked list, and applying optional rules to filter or boost the result.
Use it from a custom endpoint when you want answers like "find products similar to X" or "show cases like this one" that can't be expressed as a single search query.
The engine lives in the Mosaik.GraphDB.Similarity namespace and is reached via IQuery.ToSimilarity(...).
Anatomy of a scenario
Graph.Query()
.StartAt(seedUID) ← the "subject(s)" the signals see in ctx.Subjects
.ToSimilarity(opts)
.AddSignal("name1", s => ...) ← 1..N candidate sources (text, graph, external)
.AddSignal("name2", s => ...)
.Fuse(f => f.UsingMeanReciprocalRank(...)) ← how to combine signals into one ranking
.AddRule("name", r => r.Filter(...) | .BoostByRank(...) | .TransformFusedScore(...))
.Filter(uid => ...) ← optional final UID-level filter
.ExecuteAsync(ct);
The result is a SimilarityResult carrying the final scores plus per-signal breakdowns (useful for explanations and debugging).
A text + graph similarity endpoint
The example below answers "find products similar to this one" using:
- A text-similarity signal over indexed product names.
- A same-manufacturer signal that boosts products from the same manufacturer.
- A shared-tag signal that boosts products sharing tags with the seed.
- An MRR fusion to combine the three rankings.
- A filter rule that drops candidates outside the requested category.
- A boost-by-rank rule that lifts products the current user has previously purchased.
//ImportEndpoint("_lib/product-helpers") // ToDto, GetSimilarityIndex, …
public record SimilarRequest(
string ProductId,
int TopK = 10,
string Category = null,
bool Explain = false);
public record ScoredProduct(
string Id, string Name, double Score,
IReadOnlyDictionary<string, double> SignalScores);
var input = Body.FromJson<SimilarRequest>();
if (string.IsNullOrWhiteSpace(input.ProductId))
return BadRequest("`productId` is required.");
if (!Graph.TryGetReadOnlyContent<Product>(N.Product.Type, input.ProductId, out var seedNode))
return NotFound($"Product '{input.ProductId}' not found.");
var seedUID = seedNode.UID;
var seedName = seedNode.Get<string>(N.Product.Name) ?? "";
var seedManufacturer = Q().StartAt(seedUID).Out(N.Manufacturer.Type, E.ManufacturedBy)
.AsUIDEnumerable().FirstOrDefault();
var productTextIndex = ProductHelpers.GetSimilarityIndex(Graph);
var result = await Graph.Query()
.StartAt(seedUID)
.ToSimilarity(o => o
.MaxCandidates(200)
.MaxCandidatesPerSignal(100)
.EnableExplanations(input.Explain)
.TrackProgress(async p => await RelayStatusAsync(p.Message)))
// 1. Text similarity over product names (embedding index).
.AddSignal("SimilarName", s => s
.Describe("Name embedding similarity")
.Weight(1.0f)
.Limit(100)
.FromAsync(async ctx =>
(await ctx.Graph.Query()
.StartAtSimilarTextAsync(
seedName, count: 100,
nodeTypes: new[] { N.Product.Type },
indexUID: productTextIndex,
reverseSorting: false))
.Except(ctx.Subjects) // never recommend the seed itself
.AsUIDEnumerable()))
// 2. Same manufacturer — products N hops away on the graph.
.AddSignal("SameManufacturer", s => s
.Describe("Other products from the same manufacturer")
.Weight(0.7f)
.From(ctx => ctx.Graph.Query()
.StartAt(ctx.Subjects)
.Out(N.Manufacturer.Type, E.ManufacturedBy)
.Out(N.Product.Type, E.Manufactures)
.Except(ctx.Graph.Query().StartAt(ctx.Subjects))
.AsUIDEnumerable()))
// 3. Tag overlap — shared category/tag nodes.
.AddSignal("SharedTags", s => s
.Describe("Products that share tags with the seed")
.Weight(0.5f)
.From(ctx => ctx.Graph.Query()
.StartAt(ctx.Subjects)
.Out(N.Tag.Type, E.HasTag)
.In(N.Product.Type, E.HasTag)
.Except(ctx.Graph.Query().StartAt(ctx.Subjects))
.AsUIDEnumerable()))
// Fuse the three rankings with Reciprocal Rank Fusion.
.Fuse(f => f.UsingMeanReciprocalRank(o => o
.RankOffset(60)
.AverageOverSignalsWherePresent()))
// Hard filter — keep only the requested category.
.AddRule("CategoryFilter", r =>
{
r.Enabled(!string.IsNullOrEmpty(input.Category));
r.Filter((ctx, candidates) => ctx.Graph.Query()
.StartAt(candidates)
.IsRelatedTo(N.Category.Type, input.Category)
.AsUIDEnumerable());
})
// Soft boost — products the current user has bought before.
.AddRule("BoostPurchases", r => r
.BoostByRank(
ctx => ctx.Graph.Query()
.StartAt(CurrentUser)
.Out(N.Order.Type, E.Placed)
.Out(N.Product.Type, E.Contains)
.AsUIDEnumerable(),
weight: 0.3f,
topK: 50,
contributesAs: "PreviouslyPurchased"))
.ExecuteAsync(CancellationToken);
The result is mapped to the response JSON:
var hits = result.Scores
.OrderByDescending(kv => kv.Value)
.Take(input.TopK)
.Select(kv =>
{
Graph.TryGetReadOnlyContent<Product>(kv.Key, out var node);
var perSignal = result.Signals
.Where(s => s.Value.ContainsKey(kv.Key))
.ToDictionary(s => s.Key, s => (double)s.Value[kv.Key]);
return new ScoredProduct(
Id: node?.GetKey() ?? kv.Key.ToString(),
Name: node?.Get<string>(N.Product.Name) ?? "",
Score: kv.Value,
SignalScores: perSignal);
})
.ToList();
return Ok(new { source = input.ProductId, hits }.ToJson(), "application/json");
A response with explain = true looks like:
{
"source": "P-2199",
"hits": [
{
"id": "P-2207",
"name": "Wireless Mouse Pro",
"score": 0.0721,
"signalScores": {
"SimilarName": 0.0244,
"SameManufacturer": 0.0167,
"SharedTags": 0.0083
}
}
]
}
Signal sources
A signal's From(...) / FromAsync(...) callback receives a SignalContext and returns either IEnumerable<UID128> (rank-only) or IEnumerable<ScoredUID> (carry a score the signal pre-computed).
| Source | How |
|---|---|
| Text embeddings | ctx.Graph.Query().StartAtSimilarTextAsync(text, count, nodeTypes, indexUID, reverseSorting) |
| Graph traversal | Standard IQuery chain — StartAt(ctx.Subjects).Out(...).In(...).Where(...) |
| External lookup | Any async source — call out, then return UIDs in the workspace |
| Pre-scored hits | Return IEnumerable<ScoredUID> to feed your own scores in |
SignalContext exposes:
UID128[] Subjects— the UIDs from the scenario'sStartAt(...).Graph Graph— the live graph (admin-level). Usectx.Graph.Query(userUID)if you need ACL filtering inside the signal.
Fusion engines
| Engine | Use when |
|---|---|
UsingMeanReciprocalRank(o => ...) |
Default for combining multiple rankings. Robust against signals whose raw score scales differ. Set RankOffset(k) (RRF's k, default 60). |
UsingMaxScore() |
Take the max score per candidate across signals. Good when one signal is the source of truth and the others are tiebreakers. |
UsingMergedScores((a, b) => f(a, b)) |
Custom pairwise merge. Default is √(a² + b²) — a soft "OR" of scores. |
With a single signal, the scenario uses the signal's scores directly and no Fuse(...) is needed. With two or more signals, you must call Fuse(...) or the scenario falls back to MaxScore (still works, but rarely what you want).
Rules
Rules run after fusion. Use them to clip or reweight the result.
| Rule | Purpose |
|---|---|
r.Filter((ctx, cands) => keepEnumerable) |
Hard filter — only the returned UIDs survive. |
r.BoostByRank(ctx => rankedUIDs, weight, topK, contributesAs) |
Soft boost — adds weight / rank (RRF-style) to candidates that appear in the ranked list. |
r.TransformFusedScore((ctx, uid, score) => newScore) |
Arbitrary post-processing — useful for time-decay penalties, normalization, hard score floors. |
A rule can be conditionally disabled with r.Enabled(false) — useful when a request parameter decides whether to apply it.
Options
ToSimilarity(o => ...) accepts a SimilarityOptionsBuilder:
| Option | Effect |
|---|---|
MaxCandidates(n) |
Hard cap on the candidate pool size across all signals. |
MaxCandidatesPerSignal(n) |
Trim each signal to its top-N before fusion. |
EnableExplanations(true) |
Populate result.Rules with per-rule contributions for debugging. |
TrackTimings(true) |
Per-signal and per-rule timings on the result. |
TrackProgress(async p => …) |
Stream progress events. Wire to RelayStatusAsync to surface them to the caller. |
SimilarityResult
public class SimilarityResult
{
public UID128 Source; // First subject
public Dictionary<UID128, float> Scores; // Final fused + ruled
public Dictionary<string, Dictionary<UID128, float>> Signals; // Per-signal raw
public Dictionary<string, Dictionary<UID128, float>> Rules; // When Explanations on
}
Scores is what you turn into the page-1 result list. Signals and Rules are diagnostic — surface them when explain = true so consumers can see why an item ranked where it did.
When to reach for the similarity engine
| Need | Use |
|---|---|
| "Find products like this one" with mixed text + graph signals | Similarity engine. |
| Free-text search with facets | CreateSearchAndFacetsAsUserAsync — see Searching from Endpoints. |
| Pure semantic retrieval against an embedding index | Q().StartAtSimilarText(...) — see IQuery Similarity Search. |
| Recommendation from a saved candidate list | Skip the signals and use rules-only. |
Cross-links
- IQuery Similarity Search — the single-index lookups (
Similar,StartAtSimilarTextAsync) that feed signals. - Searching from Endpoints — when a single search request is enough.
- Sentence Embeddings — how the embedding index that powers the text-similarity signal is built.
- AI Search — operator-level configuration of the same indexes.
- Auto-generated Helpers —
N.*,E.*, andEndpoints.*constants used in signal queries. - Graph Query Language —
IQuerychains used inside signals and rules. - Clustering & Visualization — turn the
SimilarityResultinto a graph the user can explore.