Calling an Agent from a Custom Endpoint
The most common deployment shape for an agent is behind a custom endpoint: a thin HTTP wrapper that validates input, binds ${variables}, kicks off AgentAI.RunAgentAsync, waits for the run to finish, and returns the typed result. The endpoint enforces the caller's identity; the agent does the LLM work.
This page covers the server-side invocation. For the built-in REST surface (no custom endpoint needed), see REST surface at the bottom.
The server-side entry point
Mosaik.AI.AgentAI.RunAgentAsync is the canonical way to run an agent from inside any endpoint execution scope, AI tool, or scheduled task. It has two overloads:
// (1) Convenience: load the saved agent + a single user message.
Task<UID128> AgentAI.RunAgentAsync(
Graph graph,
UID128 agentUID,
string userMessage,
UID128 userUID,
UID128 chatUID = default,
IDictionary<string, string> variables = null,
CancellationToken cancellationToken = default);
// (2) Raw: caller supplies prompts, tools, provider, output schema.
Task<UID128> AgentAI.RunAgentAsync(
Graph graph,
List<IChatAIMessage> prompts,
List<UID128> enabledTools,
UID128? chatAIProviderTaskUID,
UID128 userUID,
UID128 chatUID = default,
UID128 agentUID = default,
string outputSchema = null,
CancellationToken cancellationToken = default);
Both return the _AgentRun UID. The run is persisted before the call returns, so audit and replay work for free. The default timeout is AgentAI.Timeout = 5 min; cancel the supplied CancellationToken to abort earlier.
Unknown component: alert
The convenience overload pulls the agent's SystemPrompt, attached tools, ChatTaskUID, and OutputSchema from the graph. The raw overload is for cases where the prompt or tool list is computed at call time (multi-turn loops, ad-hoc tool subsets) — prefer the convenience overload otherwise.
[!/alert]
A complete endpoint — /triage-ticket
Given the Ticket Triage agent defined in Creating Agents, the endpoint that fronts it is small:
public record TriageRequest(string TicketBody, string Product);
public record TriageDecision(
string Category,
string Severity,
string ProposedAction,
string[] CitedArticleIds);
var req = Body.FromJson<TriageRequest>();
if (req is null || string.IsNullOrWhiteSpace(req.TicketBody))
return BadRequest("`ticketBody` is required.");
var runUID = await AgentAI.RunAgentAsync(
graph: Graph.Underlying,
agentUID: AI_Agents.Ticket_Triage,
userMessage: req.TicketBody,
userUID: CurrentUser,
variables: new Dictionary<string, string> { ["PRODUCT"] = req.Product ?? "" },
cancellationToken: CancellationToken);
if (!Graph.TryGetReadOnlyContent<_AgentRun>(runUID, out var run))
return Problem("Run was not persisted.");
if (run.Status != AgentRunStatus.Completed)
return Problem($"Triage failed: {run.ErrorMessage ?? run.Status.ToString()}");
var decision = run.Result.FromJson<TriageDecision>();
return Ok(decision.ToJson(), "application/json");
A few things worth noting:
AI_Agents.Ticket_Triageis an auto-generated constant (UID128), the same pattern asAI_Tools.*. See Auto-generated Helpers.CurrentUseris passed through. Every tool call inside the run is ACL-filtered as that user — the agent never sees data the caller can't see.CancellationTokenis the endpoint's token. If the HTTP client disconnects (or the endpoint is inPoolingmode and times out), the agent's tool calls abort with it.
Pooling mode for long-running agents
Multi-step agents can easily run for 10–60 seconds. Set the endpoint's Mode to Pooling so the call returns 202 Accepted immediately and the client polls. Inside the endpoint you can stream progress with RelayStatusAsync:
await RelayStatusAsync("Searching knowledge base…");
var runUID = await AgentAI.RunAgentAsync(...);
await RelayStatusAsync("Composing answer…");
See Creating Endpoints → Pooling mode for the polling protocol; clients using the built-in EndpointsClient or Mosaik.API.Endpoints.CallAsync handle it transparently.
Variables in practice
Variables are the right place to inject per-call configuration into a stable prompt. Three patterns cover most use cases.
Locale-aware response
var locale = Headers.TryGetValue("Accept-Language", out var v)
? v.ToString().Split(',').First()
: "en";
var runUID = await AgentAI.RunAgentAsync(
graph: Graph.Underlying,
agentUID: AI_Agents.Customer_FAQ,
userMessage: req.Question,
userUID: CurrentUser,
variables: new Dictionary<string, string> { ["LOCALE"] = locale });
Tenant scoping
var tenant = Graph.Q().StartAt(CurrentUser)
.Out(N.User.Type, E.MemberOf)
.AsEnumerable().FirstOrDefault()
?.GetString(N.Tenant.Name) ?? "";
variables["TENANT"] = tenant;
Feature flags
variables["ENABLE_SUGGESTIONS"] = await IsFeatureOnAsync("ai.suggestions")
? "true" : "false";
The prompt can then branch on ${ENABLE_SUGGESTIONS} instead of you maintaining two agents.
Inspecting the run
The _AgentRun node carries the full execution trace:
| Field | What's in it |
|---|---|
Status |
Running, Completed, Canceled, or Failed. |
Result |
The model's final output. JSON when OutputSchema is set, prose otherwise. |
ErrorMessage |
Populated on Failed. Free text from the orchestrator. |
Started |
Timestamp of the call. |
Completed |
Timestamp of the final reply. Use the delta for latency dashboards. |
Messages |
Full prompt + assistant + tool-call transcript. Use it for replay/debug. |
Return just Result to the caller; keep the rest in the audit log:
if (run.Status != AgentRunStatus.Completed)
return Problem($"Agent did not complete: {run.ErrorMessage}");
Logger.LogInformation("agent {Agent} run {Run} took {Ms}ms",
AI_Agents.Ticket_Triage, runUID,
(run.Completed - run.Started).TotalMilliseconds);
return Ok(run.Result, "application/json");
Running an agent ad-hoc (raw overload)
The second RunAgentAsync overload skips the _Agent node entirely — useful for experimentation, A/B tests, or when the tool subset is computed at call time:
var prompts = new List<IChatAIMessage>
{
new ChatAIMessage(ChatAuthorRole.System,
"You are a categoriser. Reply with exactly one word: Hardware, Software, or Billing."),
new ChatAIMessage(ChatAuthorRole.User, req.TicketBody),
};
var runUID = await AgentAI.RunAgentAsync(
graph: Graph.Underlying,
prompts: prompts,
enabledTools: new List<UID128>(), // no tools — pure classification
chatAIProviderTaskUID: AI_ChatTasks.Haiku_Fast,
userUID: CurrentUser,
outputSchema: null,
cancellationToken: CancellationToken);
The raw overload is also what powers the InvokeAgent sub-agent pattern — see Sub-agent Workflows.
REST surface
For managed access from a front-end or external service — no custom endpoint needed — the workspace exposes:
| Method | Path | Body / response |
|---|---|---|
POST |
/api/chatai/agents/run |
{ AgentUID, UserMessage, ChatUID?, Variables } → AgentRun |
POST |
/api/chatai/agents/run-raw |
{ Prompts, ToolUIDs, ChatAITaskUID?, AgentUID?, OutputSchema, ChatUID? } → AgentRun |
GET |
/api/chatai/agents/runs/get/{uid} |
AgentRun |
GET |
/api/chatai/agents/runs/running |
In-memory list of currently executing runs |
GET |
/api/chatai/agents/list |
Agent[] |
Authorisation: /run requires a signed-in user (AuthorizeUser(allowGuests:false)); CRUD on /agents requires the SystemAdmin or DesktopAppUser role.
From the Tesserae front-end:
var run = await Mosaik.API.ChatAI.Agents.RunAsync(new RunAgentRequest {
AgentUID = AI_Agents.Ticket_Triage,
UserMessage = ticketBody,
Variables = new () { ["PRODUCT"] = "MacBook Air" }
});
var decision = run.Result.FromJson<TriageDecision>();
Prefer the custom-endpoint shape when you need to pre/post-process the result, enforce extra business rules, or expose a public URL with its own auth scheme. Use the built-in REST surface for in-app calls where the caller is already authenticated as a workspace user.
Security checklist
- Always pass
CurrentUsertoRunAgentAsync. Never substitutedefault(UID128)to "see everything". - Validate
${variables}you pass through. Treat them as untrusted strings interpolated into a prompt. - When the endpoint runs in
Poolingmode, propagateCancellationTokenso disconnects abort the run. - Check
run.Statusbefore readingrun.Result— aFailedrun can still have a non-empty (but unreliable) result. - Don't return the full
Messagestranscript to end users — it can include tool input parameters that reveal internal IDs.
See also
- Creating Agents
- Calling from an AI Tool
- Sub-agent Workflows
- Endpoint Execution Scopes
- Auto-generated Helpers —
AI_Agents.*andAI_Tools.*constants.