AI Search (Semantic Search)

AI search adds an embedding-based retrieval lane alongside BM25. Long-form content stays retrievable when the wording differs from the query, and hybrid mode combines both lanes for the best of each.

1. Pick the fields to embed

Settings → Search → AI Search → + Add more.
Pick the (node type, field) pairs to embed.
For each field, decide on chunking (below).

Good defaults:

Field shape	Embed?	Chunk?
Titles, summaries (< 200 tokens)	Yes	No
Bodies, content (200–4000 tokens)	Yes	Yes
Long-form articles (> 4000 tokens)	Yes	Yes, with overlap
Identifiers, SKUs, codes	No	—

Don't embed identifiers — they have no semantic content and you'll burn embedding budget for nothing.

2. Configure chunking

For anything longer than the embedding model's context window, enable Chunk Text. The workspace splits the field into overlapping windows and embeds each one independently. Hits return the parent node, deduplicated.

Knobs:

Knob	Effect
Chunk size	Tokens per chunk. Default ≈ 512; shorter for tighter retrieval, longer for context.
Overlap	Tokens shared between adjacent chunks. 20–50 tokens prevents cutoff at sentence boundaries.
Min chunk size	Skip chunks shorter than this — trailing fragments often hurt precision.

3. Similarity cutoffs

Two cutoffs control how strict matching is:

Added cutoff — minimum cosine similarity for an embedding hit to enter the result set.
Rerank cutoff — minimum similarity for an item to re-rank an existing BM25 hit.

Defaults are usually fine. Tune up if the LLM hallucinates over weak retrievals (raise both); tune down if relevant items are missing (lower both).

4. Semantic reranker

The reranker is a second pass that uses a cross-encoder model to re-score the top-N hits from hybrid retrieval. Enable when:

Precision@10 matters more than throughput.
Your corpus is large enough that BM25 + embeddings still return some near-misses.

It's an extra call per query; budget accordingly.

From code

var req = SearchRequest.For("laptop overheating");
req.BeforeTypesFacet = new HashSet<string> { "SupportCase" };
req.HybridSearch     = true;          // Use BM25 + embeddings together
req.SemanticRerank   = true;          // Apply the cross-encoder reranker

var query = await Graph.CreateSearchAsUserAsync(req, CurrentUser, CancellationToken);
return query.Take(10).EmitWithScores();

For pure semantic retrieval (no BM25), use:

return Q().StartAtSimilarText("laptop overheating",
                              nodeTypes: new[] { "SupportCase" })
          .Take(10)
          .EmitWithScores();

Budget and performance

Embedding cost scales with the field corpus size on first index, then with the change rate.
Pause embedding generation under Settings → Search Index before a large backfill; resume after.
Embedding latency on query is fixed and small (~10 ms). Hybrid mode is fast; reranking adds 50–200 ms.