Calling an Agent from a Custom Endpoint

The most common deployment shape for an agent is behind a custom endpoint: a thin HTTP wrapper that validates input, binds ${variables}, kicks off AgentAI.RunAgentAsync, waits for the run to finish, and returns the typed result. The endpoint enforces the caller's identity; the agent does the LLM work.

This page covers the server-side invocation. For the built-in REST surface (no custom endpoint needed), see REST surface at the bottom.

The server-side entry point

Mosaik.AI.AgentAI.RunAgentAsync is the canonical way to run an agent from inside any endpoint execution scope, AI tool, or scheduled task. It has two overloads:

// (1) Convenience: load the saved agent + a single user message.
Task<UID128> AgentAI.RunAgentAsync(
    Graph graph,
    UID128 agentUID,
    string userMessage,
    UID128 userUID,
    UID128 chatUID = default,
    IDictionary<string, string> variables = null,
    CancellationToken cancellationToken = default);

// (2) Raw: caller supplies prompts, tools, provider, output schema.
Task<UID128> AgentAI.RunAgentAsync(
    Graph graph,
    List<IChatAIMessage> prompts,
    List<UID128> enabledTools,
    UID128? chatAIProviderTaskUID,
    UID128 userUID,
    UID128 chatUID = default,
    UID128 agentUID = default,
    string outputSchema = null,
    CancellationToken cancellationToken = default);

Both return the _AgentRun UID. The run is persisted before the call returns, so audit and replay work for free. The default timeout is AgentAI.Timeout = 5 min; cancel the supplied CancellationToken to abort earlier.

Unknown component: alert The convenience overload pulls the agent's SystemPrompt, attached tools, ChatTaskUID, and OutputSchema from the graph. The raw overload is for cases where the prompt or tool list is computed at call time (multi-turn loops, ad-hoc tool subsets) — prefer the convenience overload otherwise. [!/alert]

A complete endpoint — `/triage-ticket`

Given the Ticket Triage agent defined in Creating Agents, the endpoint that fronts it is small:

public record TriageRequest(string TicketBody, string Product);

public record TriageDecision(
    string Category,
    string Severity,
    string ProposedAction,
    string[] CitedArticleIds);

var req = Body.FromJson<TriageRequest>();
if (req is null || string.IsNullOrWhiteSpace(req.TicketBody))
    return BadRequest("`ticketBody` is required.");

var runUID = await AgentAI.RunAgentAsync(
    graph:       Graph.Underlying,
    agentUID:    AI_Agents.Ticket_Triage,
    userMessage: req.TicketBody,
    userUID:     CurrentUser,
    variables:   new Dictionary<string, string> { ["PRODUCT"] = req.Product ?? "" },
    cancellationToken: CancellationToken);

if (!Graph.TryGetReadOnlyContent<_AgentRun>(runUID, out var run))
    return Problem("Run was not persisted.");

if (run.Status != AgentRunStatus.Completed)
    return Problem($"Triage failed: {run.ErrorMessage ?? run.Status.ToString()}");

var decision = run.Result.FromJson<TriageDecision>();
return Ok(decision.ToJson(), "application/json");

A few things worth noting:

AI_Agents.Ticket_Triage is an auto-generated constant (UID128), the same pattern as AI_Tools.*. See Auto-generated Helpers.
CurrentUser is passed through. Every tool call inside the run is ACL-filtered as that user — the agent never sees data the caller can't see.
CancellationToken is the endpoint's token. If the HTTP client disconnects (or the endpoint is in Pooling mode and times out), the agent's tool calls abort with it.

Pooling mode for long-running agents

Multi-step agents can easily run for 10–60 seconds. Set the endpoint's Mode to Pooling so the call returns 202 Accepted immediately and the client polls. Inside the endpoint you can stream progress with RelayStatusAsync:

await RelayStatusAsync("Searching knowledge base…");
var runUID = await AgentAI.RunAgentAsync(...);
await RelayStatusAsync("Composing answer…");

See Creating Endpoints → Pooling mode for the polling protocol; clients using the built-in EndpointsClient or Mosaik.API.Endpoints.CallAsync handle it transparently.

Variables in practice

Variables are the right place to inject per-call configuration into a stable prompt. Three patterns cover most use cases.

Locale-aware response

var locale = Headers.TryGetValue("Accept-Language", out var v)
    ? v.ToString().Split(',').First()
    : "en";

var runUID = await AgentAI.RunAgentAsync(
    graph:       Graph.Underlying,
    agentUID:    AI_Agents.Customer_FAQ,
    userMessage: req.Question,
    userUID:     CurrentUser,
    variables:   new Dictionary<string, string> { ["LOCALE"] = locale });

Tenant scoping

var tenant = Graph.Q().StartAt(CurrentUser)
    .Out(N.User.Type, E.MemberOf)
    .AsEnumerable().FirstOrDefault()
    ?.GetString(N.Tenant.Name) ?? "";

variables["TENANT"] = tenant;

Feature flags

variables["ENABLE_SUGGESTIONS"] = await IsFeatureOnAsync("ai.suggestions")
    ? "true" : "false";

The prompt can then branch on ${ENABLE_SUGGESTIONS} instead of you maintaining two agents.

Inspecting the run

The _AgentRun node carries the full execution trace:

Field	What's in it
`Status`	`Running`, `Completed`, `Canceled`, or `Failed`.
`Result`	The model's final output. JSON when `OutputSchema` is set, prose otherwise.
`ErrorMessage`	Populated on `Failed`. Free text from the orchestrator.
`Started`	Timestamp of the call.
`Completed`	Timestamp of the final reply. Use the delta for latency dashboards.
`Messages`	Full prompt + assistant + tool-call transcript. Use it for replay/debug.

Return just Result to the caller; keep the rest in the audit log:

if (run.Status != AgentRunStatus.Completed)
    return Problem($"Agent did not complete: {run.ErrorMessage}");

Logger.LogInformation("agent {Agent} run {Run} took {Ms}ms",
    AI_Agents.Ticket_Triage, runUID,
    (run.Completed - run.Started).TotalMilliseconds);

return Ok(run.Result, "application/json");

Running an agent ad-hoc (raw overload)

The second RunAgentAsync overload skips the _Agent node entirely — useful for experimentation, A/B tests, or when the tool subset is computed at call time:

var prompts = new List<IChatAIMessage>
{
    new ChatAIMessage(ChatAuthorRole.System,
        "You are a categoriser. Reply with exactly one word: Hardware, Software, or Billing."),
    new ChatAIMessage(ChatAuthorRole.User, req.TicketBody),
};

var runUID = await AgentAI.RunAgentAsync(
    graph:                Graph.Underlying,
    prompts:              prompts,
    enabledTools:         new List<UID128>(),         // no tools — pure classification
    chatAIProviderTaskUID: AI_ChatTasks.Haiku_Fast,
    userUID:              CurrentUser,
    outputSchema:         null,
    cancellationToken:    CancellationToken);

The raw overload is also what powers the InvokeAgent sub-agent pattern — see Sub-agent Workflows.

REST surface

For managed access from a front-end or external service — no custom endpoint needed — the workspace exposes:

Method	Path	Body / response
`POST`	`/api/chatai/agents/run`	`{ AgentUID, UserMessage, ChatUID?, Variables }` → `AgentRun`
`POST`	`/api/chatai/agents/run-raw`	`{ Prompts, ToolUIDs, ChatAITaskUID?, AgentUID?, OutputSchema, ChatUID? }` → `AgentRun`
`GET`	`/api/chatai/agents/runs/get/{uid}`	`AgentRun`
`GET`	`/api/chatai/agents/runs/running`	In-memory list of currently executing runs
`GET`	`/api/chatai/agents/list`	`Agent[]`

Authorisation: /run requires a signed-in user (AuthorizeUser(allowGuests:false)); CRUD on /agents requires the SystemAdmin or DesktopAppUser role.

From the Tesserae front-end:

var run = await Mosaik.API.ChatAI.Agents.RunAsync(new RunAgentRequest {
    AgentUID    = AI_Agents.Ticket_Triage,
    UserMessage = ticketBody,
    Variables   = new () { ["PRODUCT"] = "MacBook Air" }
});
var decision = run.Result.FromJson<TriageDecision>();

Prefer the custom-endpoint shape when you need to pre/post-process the result, enforce extra business rules, or expose a public URL with its own auth scheme. Use the built-in REST surface for in-app calls where the caller is already authenticated as a workspace user.

Security checklist

Always pass CurrentUser to RunAgentAsync. Never substitute default(UID128) to "see everything".
Validate ${variables} you pass through. Treat them as untrusted strings interpolated into a prompt.
When the endpoint runs in Pooling mode, propagate CancellationToken so disconnects abort the run.
Check run.Status before reading run.Result — a Failed run can still have a non-empty (but unreliable) result.
Don't return the full Messages transcript to end users — it can include tool input parameters that reveal internal IDs.

Calling an Agent from a Custom Endpoint

The server-side entry point

A complete endpoint — `/triage-ticket`

Pooling mode for long-running agents

Variables in practice

Locale-aware response

Tenant scoping

Feature flags

Inspecting the run

Running an agent ad-hoc (raw overload)

REST surface

Security checklist

See also

Referenced by

Calling an Agent from a Custom Endpoint

The server-side entry point

A complete endpoint — /triage-ticket

Pooling mode for long-running agents

Variables in practice

Locale-aware response

Tenant scoping

Feature flags

Inspecting the run

Running an agent ad-hoc (raw overload)

REST surface

Security checklist

See also

Referenced by

A complete endpoint — `/triage-ticket`