Curiosity - Implementing Graph RAG

Implementing Graph RAG

Graph RAG retrieves a connected subgraph as the grounding context for an LLM, instead of a flat list of text chunks. In a Curiosity Workspace this means using the knowledge graph's edges to decide what to retrieve, then fusing keyword and semantic retrieval signals over that scoped set with the similarity engine, and finally letting the LLM synthesize a cited answer.

This page is the end-to-end implementation guide. For the surrounding architecture (the orchestrator, the tool registry, the permission model) see RAG and agent architecture. For the retrieval mechanics it builds on, see Hybrid Search and Graph Reasoning (concepts).

Why the graph matters for retrieval

Flat vector RAG answers single-hop questions well: the answer lives in one chunk, and that chunk is semantically close to the question. It struggles when the answer requires joining facts that live in different documents.

Take a support scenario:

"Which customers are affected by the firmware fault first reported in TICKET-4821?"

The ticket text mentions a fault and a device model. The list of affected customers is nowhere in that text — it's reachable only by following edges: Ticket → ForDevice → Device → InstalledAt → Customer. A vector search over the ticket returns the ticket; it cannot return the customers, because no chunk contains them. Graph RAG answers it by traversing those edges first, then retrieving over the resulting node set.

The rule of thumb: use the graph to choose the candidate set, fuse keyword + semantic signals to rank within it, use the LLM to phrase the answer.

The pipeline

flowchart LR Q[User question] --> S[1 Spot entities<br/>in the question] S --> X[2 Expand<br/>graph neighbourhood] X --> H[3 Fuse signals<br/>keyword + semantic] H --> R[4 Register snippets<br/>+ synthesize] R --> A[Cited answer] Build[Ingestion + NLP linking] -.builds.-> KG[(Knowledge graph)] KG -.traversed by.-> X KG -.indexed for.-> H

Stages 1–4 happen at query time inside an AI tool. The knowledge graph itself is built once, during ingestion, and kept current by enrichment tasks. The next sections cover both halves.

Build the graph (ingestion time)

Graph RAG is only as good as the edges it can traverse. Two things have to be true before retrieval works: entities have to exist as nodes, and documents have to be linked to them.

1. Model entities and relationships as first-class nodes

Decide which entities users navigate by — Customer, Device, Ticket, Document — and model the relationships between them as typed edges rather than as string properties. A Mentions or ForDevice edge is traversable; a "deviceModel": "MBP-2024" property is not.

See Schema Design for the modelling patterns and Graph Design Patterns for the "relationship as a first-class signal" rule that graph RAG depends on.

2. Link documents to entities during ingestion

When a document lands in the graph, Curiosity's NLP pipeline runs entity extraction over its text and writes a Mentions edge from the document to each entity node it recognizes. This is what turns a pile of documents into a connected graph.

Configure spotters (dictionary, pattern, ML/NER, or LLM) per (nodeType, field) so extraction links rather than just annotates:

spotters:
  - kind: dictionary
    name: devices
    entries:
      - value: MacBook Pro
        aliases: [MBP, "MacBook Pro 2024"]
        link_to: Device:MBP-2024
    min_confidence: 0.85          # auto-link only above 0.85
  - kind: pattern
    name: ticket-id
    regex: "TICKET-\\d{4,6}"
    link_to_type: SupportCase
    link_strategy: by-key

Each high-confidence match creates the Mentions edge that retrieval will later traverse. For the full configuration surface, model comparison, and the precision thresholds that govern auto-linking, see Entity Extraction.

For programmatic linking inside a connector or scheduled task:

await foreach (var ticket in Q().StartAt("Ticket").Take(1000).AsEnumerableAsync())
{
    var entities = await NLP.ExtractAsync(ticket.GetString("Body"));

    foreach (var e in entities)
    {
        var entityNode = graph.TryAdd(new Entity { Name = e.Canonical });
        graph.Link(ticket, entityNode, "Mentions");
    }
}
await graph.CommitPendingAsync();

3. Embed the text so chunks are retrievable

Register a sentence-embedding index on the text fields you want to retrieve over. Curiosity chunks the text (with overlap) and stores each chunk's vector in an HNSW index; the default Harrier model caps chunks at 2048 tokens. Keyword (Lucene) and vector indexes are registered the same way per type+field. See Embeddings and Built-in embedding models.

4. (Optional) Precompute community summaries for global questions

Graph-expansion retrieval is local: it answers "tell me about X and its neighbourhood". Some questions are global: "what are the recurring failure themes this quarter?" No single neighbourhood contains that answer.

The Curiosity-native way to support global questions is to precompute topic summaries as their own nodes, in a scheduled enrichment task:

Cluster related entities — group by shared Mentions, or surface hubs with SortByConnectivity() (see Graph Reasoning and Analytics).
For each cluster, gather its members' text and ask the LLM for a short summary.
Write the summary back as a Topic node, embed it, and link members with a BelongsToTopic edge.

// Enrichment task: one summary node per cluster of co-mentioned devices
foreach (var cluster in clusters)
{
    var memberText = string.Join("\n\n",
        cluster.Select(uid => ChatAI.GetTextFromNode(uid, limit: 2_000)));

    var summary = await ChatAI.SummarizeAsync(memberText, maxWords: 200);
    var topic   = graph.TryAdd(new Topic { Title = cluster.Label, Summary = summary });

    foreach (var uid in cluster)
    {
        graph.Link(topic.UID, uid, "BelongsToTopic");
    }
}
await graph.CommitPendingAsync();

Because Topic nodes are embedded like any other text, a query that doesn't match any single document can still match the right summary. This is the "macro view" that complements per-chunk retrieval.

Build it as a backfill, then keep it fresh

Run the summary task once over the whole corpus, then schedule it to re-summarize only clusters whose members changed. See the backfill and enrichment templates.

Retrieve (query time)

Retrieval lives inside an AI tool — an annotated C# method the chat orchestrator exposes to the LLM. The tool does the graph work the LLM can't: it traverses edges, scopes the search, and registers citations.

Stage 1–2: spot entities and expand the neighbourhood

Resolve the entities named in the question (the same spotters from ingestion work on the query string), then traverse from them to build the candidate set. Push that set into SearchRequest.TargetUIDs so the hybrid search runs only over the relevant neighbourhood:

// "Tickets and docs connected to the device(s) in this question"
var seedUIDs = await NLP.ResolveEntitiesAsync(query, scope.CancellationToken);

var neighbourhood = Q()
    .StartAt(seedUIDs)
    .Out("ForDevice", "Mentions")     // edges that define "related to"
    .Out("InstalledAt")               // one more hop: reach customers
    .AsUIDEnumerable()
    .ToArray();

Each .Out(...) is one hop. The number of hops is the lever for the multi-hop question above — stop at the depth the question needs, and no further, or the candidate set balloons.

Stage 3: fuse keyword + semantic signals over the subgraph

Instead of one search call, build a similarity scenario and feed it two independent signals — a semantic "similar text" signal and a keyword signal — then let the engine fuse their rankings. A signal is just a ranked list of scored UIDs; the engine trims, weights, and merges them into one ranking.

var neighbourhoodSet = neighbourhood.ToHashSet();

var fused = await Q()
    .StartAt(neighbourhood)                         // subjects: the expanded subgraph
    .ToSimilarity(o => o.MaxCandidatesPerSignal(50).MaxCandidates(50))
    .AddSignal("semantic", s => s
       .Weight(1.0f)
       .FromAsync(async ctx =>
          (await ctx.Graph.Q().StartAtSimilarTextAsync(
                 query, count: 50, nodeTypes: new[] { "Ticket", "Document" }))
              .AsScoredUIDEnumerable()))
    .AddSignal("keyword", s => s
       .Weight(1.0f)
       .FromAsync(async ctx =>
       {
           var kw = SearchRequest.For(query).WithTypesFacet("Ticket", "Document");
           kw.TargetUIDs = neighbourhood;           // scope the keyword search to the subgraph

           var hits = await ctx.Graph.CreateSearchAsUserAsync(
                          kw, scope.CurrentUser, scope.CancellationToken);
           return hits.AsScoredUIDEnumerable();
       }))
    .Fuse(f => f.UsingReciprocalRankFusion())
    .Filter(uids => uids.Where(neighbourhoodSet.Contains))  // keep results inside the subgraph
    .FilterAsUser(scope.CurrentUser)                        // enforce the caller's ACL
    .ExecuteAsync(scope.CancellationToken);

Why two signals instead of one search mode:

The semantic signal (StartAtSimilarTextAsync) matches paraphrases — text that means the same thing in different words. See Semantic Similarity.
The keyword signal (CreateSearchAsUserAsync) matches exact identifiers — ticket numbers, SKUs, error codes — that embeddings blur. See Hybrid Search for the keyword-vs-semantic intuition.
Reciprocal-rank fusion merges the two ranked lists without requiring their scores to share a scale; swap in UsingMaxScore() or UsingMergedScores(...) if you prefer. Per-signal Weight(...) biases the blend.

Two parts carry the safety properties. Signals run against the full graph, so FilterAsUser(scope.CurrentUser) is the boundary that drops anything the caller can't see — it replaces the per-call CreateSearchAsUserAsync guarantee for the semantic signal, which would otherwise run unscoped. The .Filter(...) keeps fused results inside the traversed subgraph, which is what lifts precision over flat RAG. To answer global questions in the same call, add Topic to the signals' node types so summaries compete in the fused ranking.

Stage 4: register snippets and let the LLM synthesize

Walk the fused scores top-down, pull grounding text for each hit, register it as a snippet, and return the snippet IDs. The orchestrator instructs the LLM to cite only registered IDs as [1], [2], which the UI turns into clickable source cards.

var grounded = fused.Scores
    .OrderByDescending(kv => kv.Value)
    .Take(8)
    .Select(kv =>
    {
        var text = scope.ChatAI.GetTextFromNode(kv.Key, limit: 4_000);
        var id   = scope.AddSnippet(uid: kv.Key, text: text);
        return new { snippetId = id, score = kv.Value, text };
    }).ToArray();

scope.SetToolCallDisplayName($"Searched the neighbourhood of '{query}'");
return grounded.ToJson();

The complete tool

Putting the four stages together, a graph RAG retrieval tool looks like this:

public class GraphRagTools
{
    [Tool("Answer a question by retrieving the connected subgraph around the entities it mentions, then searching within it.")]
    public static async Task<string> RetrieveConnected(ToolScope scope,
        [Parameter("The user's question", required: true)] string query,
        [Parameter("How many graph hops to expand (1-3)", required: false)] int hops = 2)
    {
        // 1-2. Spot entities in the question and expand the neighbourhood.
        var seedUIDs = await NLP.ResolveEntitiesAsync(query, scope.CancellationToken);

        var traversal = scope.Graph.Q().StartAt(seedUIDs);
        for (var h = 0; h < System.Math.Clamp(hops, 1, 3); h++)
        {
            traversal = traversal.Out("ForDevice", "Mentions", "InstalledAt");
        }
        var neighbourhood = traversal.AsUIDEnumerable().ToArray();

        if (neighbourhood.Length == 0) return "{\"hits\":[],\"note\":\"No connected entities found.\"}";

        var neighbourhoodSet = neighbourhood.ToHashSet();

        // 3. Fuse a semantic and a keyword signal, scoped to the subgraph and the user's ACL.
        var fused = await scope.Graph.Q()
            .StartAt(neighbourhood)
            .ToSimilarity(o => o.MaxCandidatesPerSignal(50).MaxCandidates(50))
            .AddSignal("semantic", s => s
               .FromAsync(async ctx =>
                  (await ctx.Graph.Q().StartAtSimilarTextAsync(
                         query, count: 50, nodeTypes: new[] { "Ticket", "Document" }))
                      .AsScoredUIDEnumerable()))
            .AddSignal("keyword", s => s
               .FromAsync(async ctx =>
               {
                   var kw = SearchRequest.For(query).WithTypesFacet("Ticket", "Document");
                   kw.TargetUIDs = neighbourhood;

                   var hits = await ctx.Graph.CreateSearchAsUserAsync(
                                  kw, scope.CurrentUser, scope.CancellationToken);
                   return hits.AsScoredUIDEnumerable();
               }))
            .Fuse(f => f.UsingReciprocalRankFusion())
            .Filter(uids => uids.Where(neighbourhoodSet.Contains))
            .FilterAsUser(scope.CurrentUser)
            .ExecuteAsync(scope.CancellationToken);

        // 4. Register snippets for citation and hand back to the LLM.
        var grounded = fused.Scores
            .OrderByDescending(kv => kv.Value)
            .Take(8)
            .Select(kv =>
            {
                var text = scope.ChatAI.GetTextFromNode(kv.Key, limit: 4_000);
                var id   = scope.AddSnippet(uid: kv.Key, text: text);
                return new { snippetId = id, score = kv.Value, text };
            }).ToArray();

        scope.SetToolCallDisplayName($"Retrieved {grounded.Length} sources near '{query}'");
        return grounded.ToJson();
    }
}
return new GraphRagTools();

Expose the tool to an agent by adding it to the agent's tool allowlist. Pair it with a system prompt that tells the LLM to call RetrieveConnected before answering and to cite every claim with a snippet ID. The orchestrator handles streaming, the audit log, and per-call timeouts — you write the retrieval, not the loop.

When graph RAG is worth it

Question shape	Best approach
Answer is in one document; question paraphrases it	Flat hybrid search — no traversal needed
Answer joins facts across linked records (multi-hop)	Graph RAG (graph expansion → hybrid)
"What are the broad themes / patterns?" (global)	Search over precomputed `Topic` summaries
Mix of the above	Expose both tools; let the agent choose

Graph RAG adds a traversal and a scoping step. If your corpus is a flat set of unrelated documents, those steps cost latency for no recall gain — start with flat hybrid search and add the graph layer when questions start spanning records.

Pitfalls

Expanding too many hops. Each hop multiplies the candidate set. Two hops is usually enough; three is the ceiling for interactive chat. Stop at the depth the question needs.
Dropping the ACL filter. Similarity signals run against the full graph, so a scenario without FilterAsUser(scope.CurrentUser) can surface nodes the caller can't see. Keep it as the last step before ExecuteAsync.
No edges to traverse. If entity linking didn't run (or fired below the auto-link threshold), the graph is just disconnected documents and expansion returns nothing. Verify Mentions edges exist before blaming retrieval — see the entity extraction review loop.
Passing whole documents to the LLM. Register snippets, not full bodies; hybrid search already selected the relevant chunks.
Stale Topic summaries. A summary computed last quarter describes last quarter's clusters. Re-run the enrichment task when members change.