
Cost and operations
A 5-step agent makes 5 model calls. Model choice and tool design are your main cost levers.
Model selection by role:
| Role | Model |
|---|---|
| Tool routing / classification | gpt-4o-mini, claude-haiku-4-5 |
| Final answer synthesis | gpt-4o, claude-sonnet-4-6 |
| Air-gapped / no egress | Local 70B on GPU |
Consider using a smaller model for tool selection and a larger one only for the final answer.
Cost guardrails:
- Set a daily token ceiling: Settings → AI Settings → Quotas
- Set
max_tokensper call (start at 1024) - Cap tool calls per turn (5 is a reasonable default)
- Cache aggressively — identical queries in a session re-use previous results
Monitor per-tool metrics:
GET /api/chatai/tools/metrics
Returns per-tool call counts, latency, p95, and error rate. Wire to your monitoring stack.
Security checklist for tools:
- Always use
scope.CurrentUserfor retrieval — never system context - Validate all inputs (treat LLM-supplied parameters as untrusted)
- Log caller and target IDs for any mutation
- Destructive actions: propose → confirm, never auto-execute