Creating a code index

This page walks through setting up a Custom Code Index end-to-end: where to create it in the admin UI, the fields you have to fill in, the rules the body has to follow before it will compile, and how to keep the source under version control. The conceptual overview lives in Code Indexes — Introduction; the full identifier reference is in Code Index Scope.

Prerequisites

Custom API feature flag — code indexes are gated behind it. If the menu item is missing or shows a license-check modal when you click + New, the feature isn't enabled for the workspace.
Target node type — a code index always runs against one node type. The type must already exist in the schema before you create the index.

Step 1 — Open the Code Indexes view

In the admin sidebar, go to Manage → Indexes → Code Indexes. The view lists every code index and federated search index defined in the workspace, grouped by target node type.

Code Indexes view

Three actions live at the top of the list:

+ New code index — creates a normal code index that derives data from existing nodes.
+ New federated search index — creates a search code index that materializes virtual nodes at query time. Different scope, different page — not covered here.
Import source-code — re-creates indexes from a .json or .zip export. See Export and import below.

Step 2 — Pick the target node type and a name

Clicking + New code index opens the editor as a near-full-screen modal. Before any code runs you have to set three things:

Field	What it controls
Node Type	The schema type this index runs against. Every commit to a node of this type queues it for the index.
Index Name	A short label that disambiguates multiple indexes on the same node type. Shown in the list as `Index for {Type} ({Name})`.
Maximum Batch Size	Upper bound on how many UIDs the worker hands to one invocation of your body via `ToIndex`. Default `100000`.
Minimum Batch Size	Lower bound — the worker waits until at least this many items are queued before invoking the body. Default `0` (fire as soon as anything is queued).

Pick the batch sizes deliberately. If your body calls an external service per UID (an LLM, an HTTP API), set Maximum Batch Size to something the service can handle inside one invocation — a 100 000-item batch that talks to OpenAI is going to spend most of its time blocked, and you lose the ability to cancel cleanly mid-batch. A few hundred is usually the right order of magnitude for AI-bound work.

Step 3 — Write the body

The editor opens with a working skeleton. Replace the body, but keep the shape — the index won't compile without it:

Default body shipped by Curiosity

// You can use sync or async code in this block.
// External systems can be called via an HttpClient instance.

var failed = new List<UID128>();

var httpClient = CreateHttpClient();

foreach (var uid in ToIndex)
{
    if (CancellationToken.IsCancellationRequested)
    {
        return ToIndex;
    }

    if (!TryIndex(uid))
    {
        failed.Add(uid);
    }
}

return failed;

bool TryIndex(UID128 uid)
{
    Logger.LogInformation($"Indexing {uid}");
    return true;
}

Three things from this template are not stylistic — they're enforced:

Cancellation handling is mandatory

The body must mention either CancellationToken.IsCancellationRequested or CancellationToken.ThrowIfCancellationRequested(). The compiler rejects the index outright if neither string appears in the source — this is a textual check, so calling a helper that handles cancellation internally is not enough; the keyword has to appear somewhere in your code.

In practice the simplest pattern is to check at the top of the loop and return the unprocessed remainder of ToIndex when cancellation fires, exactly like the template does. That lets the worker requeue the rest.

The return value is the failed list

Whatever you return is treated as the UIDs the worker should requeue with backoff:

Return null (or no return) — every UID is considered processed.
Return a List<UID128> of just the failures — those get retried.
Return ToIndex on cancellation — the whole batch is requeued.

If your body throws, the exception is recorded as the index's last error, the entire ToIndex batch is requeued, and the index keeps running. So returning null on success is fine; you only need an explicit list when you want to retry some UIDs and not others.

Available identifiers

Inside the body you have direct access to Graph (G), Query() (Q()), ToIndex, IndexUID, ChatAI, AgentAI, Tracker, CreateHttpClient(), Logger, CancellationToken, RunEndpointAsync<T>(...), and RunToolAsync(...). The full reference, including overloads, is in Code Index Scope.

Imports for System, System.Linq, System.Collections.Generic, System.Threading.Tasks, Mosaik.Core, Mosaik.Schema, Mosaik.GraphDB, Mosaik.GraphDB.Indexes, Catalyst, Newtonsoft.Json, System.Net.Http, Microsoft.Extensions.Logging, and UID are already in scope. The editor's IntelliSense knows about all of them.

Step 4 — Validate and save

The editor revalidates your code about a second after you stop typing and surfaces compilation errors inline. The Save button stays disabled until both the settings validator and the code validator agree the index is valid.

A typical first-save flow:

Pick the target node type.

The settings validator stays red until you do.

Write the body.

As soon as you stop typing, the editor calls the diagnostics endpoint and inlines any Roslyn errors next to the offending line.

Wait for the green Save button.

Compile errors and a missing cancellation check both keep it disabled.

Hit Save.

On a new index this calls the create endpoint; on an existing one it persists the updated settings. The modal closes when the index is committed.

On first save, the worker enqueues every existing node of the target type and starts feeding them through the body — no separate "build" step. For workspaces with more than ~10 000 nodes of that type the initial pass runs in the background; below that it's done synchronously before the save returns.

On subsequent edits the new body is compiled and stored, but already-indexed nodes are not re-queued. From that point on, the new code only runs against nodes that get committed (or are still sitting in the queue from before the edit). To force a full re-pass against the existing corpus, use the Recreate button in the general Indexes view — see Forcing a re-run below.

Step 5 — Monitoring

Each row in the Code Indexes list shows a live status badge: how many UIDs are queued, whether the worker is currently running, and how long it's been running for. The status refreshes every five seconds.

Two fields in the status payload are the ones you actually read when debugging:

Field	What it tells you
`HasValidCode`	`True` if the body compiled and the cancellation check passed. `False` means the index is saved but inert.
`LastException`	The exception string from the most recent failed batch. Cleared on the next successful run.
`QueueSize`	Number of UIDs the worker has lined up for this index. A queue that doesn't drain usually means `HasValidCode = False` or every batch is throwing.

You can also pull the same data programmatically via API.Indexes.GetStatusAsync(IndexTypes.CustomCodeIndex) if you're hooking the workspace's health into an external monitor.

Forcing a re-run

Saving an edited body doesn't re-queue existing nodes, so when you change the logic you usually want to force a re-pass. Two UI surfaces give you that, and one programmatic option is also worth knowing about:

Manage → Indexes → Indexes (the general list). Each row, including every Custom Code Index, has a refresh icon — Recreate this index — that re-enqueues every node of the target type through the current body. A neighbouring broom icon — Clear — empties the index queue and indexed content without re-enqueueing, so you can pair the two when you want a clean slate before re-running.
Manage → Indexes → Indexes → Recreate all (the toolbar button on the right of the search box). Walks every index currently visible in the list — respecting the search filter — and recreates them. Useful when you've changed several code indexes at once.
Programmatic. await API.Indexes.RecreateIndex(nodeType, indexUID) from the admin client, or a Graph.CommitAsync on every affected node from a migration / scheduled task. Re-committing a node re-queues it for every index targeting its type.

Step 6 — Delete

The trash-can button on each row deletes the index. There's a confirmation dialog but no soft-delete: the index, its compiled script, and its queue go away immediately. Derived data the body wrote into the graph (e.g. nodes added via Graph.TryAdd, edges via Graph.Link) stays where it is — only the index itself is removed.

Export and import

For source control or moving indexes between workspaces, use the toolbar buttons:

Export source-code (top of the Code Indexes view) downloads a .zip containing one .cs file per index. Each file starts with two attributes Curiosity uses to re-create the index on import:
```
[indexes: Curiosity.Indexes.CodeIndex("SupportCase")]
[indexes: Curiosity.Indexes.Name("LLM enrichment")]

// ...your code-index body...
```
Both attributes are mandatory. The first line maps to Node Type, the second to Index Name.
Import source-code accepts a .json (single index) or a .zip (batch). Existing indexes with the same node type and name are overwritten; new ones are created from scratch. Settings other than the body — batch sizes, etc. — fall back to defaults on import.

This format also makes the .cs files diff-friendly, so review of code-index changes can go through the same pull-request flow as the rest of the workspace.

A worked minimum example

Here's the shortest viable index — runs over Article nodes, copies the article body into a sibling Source node so the workspace's text indexes can pick it up:

Article → Source mirror

foreach (var uid in ToIndex)
{
    if (CancellationToken.IsCancellationRequested) return ToIndex;

    var article = Graph.Get(uid);
    var body    = article.GetString(N.Article.Body);

    if (string.IsNullOrWhiteSpace(body)) continue;

    var sourceKey  = "source-for-" + uid;
    var sourceNode = await Graph.GetOrAddLockedAsync(N.Source.Type, sourceKey);

    sourceNode.SetString(N.Source.ExtractedText, body);
    article.AddUniqueEdge(E.HasDocument, sourceNode);

    await Graph.CommitAsync(article, sourceNode);
}

return null;

That covers every requirement: cancellation check, batched processing of ToIndex, side effects via the safe Graph, and a null return on success. Build out from there.

Code Indexes — Introduction — when to use a code index versus a connector or a search scope.
Code Index Scope — the full identifier reference (Graph, Q(), ChatAI, AgentAI, RunEndpointAsync, …).
Extract files to Markdown — the canonical end-to-end example, with locking, idempotency, and re-queue semantics.
Indexes overview — every index type, not just code.

Referenced by

Introduction