Curiosity
Two-column slide with a comparison table and a dashboard wireframe for cost and operations analysis.

Cost, ops & shipping

A 5-step agent makes 5 model calls. Model choice and tool design are your main cost levers.


Model selection by role:

Role Model
Tool routing / classification gpt-4o-mini, claude-haiku-4-5
Final answer synthesis gpt-4o, claude-sonnet-4-6
Air-gapped / no egress Local 70B on GPU

Consider a smaller model for tool selection and a larger one only for the final answer.

Cost guardrails:

  • Daily token ceiling: Settings → AI Settings → Quotas
  • max_tokens per call (start at 1024); cap tool/sub-agent calls per turn
  • Cache aggressively — identical queries in a session re-use previous results
  • Monitor per-tool metrics: GET /api/chatai/tools/metrics (counts, latency, p95, error rate)

Most agents fall into two shapes — start from the nearest and change the schema:

A catalog of example agents beside a shipping checklist for putting an agent into production.
Agent Tools Model Shape
Ticket triage none small Single-shot classify → enum + action
KB Q&A search + fetch mid RAG → grounded answer with [1] citations
Lead qualifier graph snapshots mid RAG → numeric score + reasons
Document enricher none small Per-node enrich → summary, tags, sentiment
  • Single-shot extractors — no tools, one model call, one structured output. Small model, big batch (a code index can run one against every node of a type).
  • RAG agents — 2–4 focused tools, a larger model, a schema with explicit citation fields.

Shipping checklist:

  • Pin a model (ChatTaskUID) — don't leave it to the caller's default in production.
  • Smallest tool set that does the job — overlapping tools degrade routing.
  • OutputSchema whenever a downstream consumes the result.
  • CurrentUser on every run and every tool — never the system identity for user-facing work; treat LLM-supplied parameters as untrusted.
  • Destructive actions: propose → confirm, never auto-execute.
  • Export the agent as code and promote dev → staging → production like an endpoint.

Example Agents · LLM configuration