Metrics reference
The workspace exposes per-endpoint and per-tool metrics over HTTP so you can wire them into your monitoring stack (Prometheus, Datadog, Grafana, the workspace's built-in admin views, etc.).
This page lists the routes and the exact response shapes. For configuring dashboards, see Administration → Monitoring.
Endpoint metrics
Every custom endpoint produces request-rate, latency, error, and query-tracker counters. Both routes return the same response shape (EndpointMetricsResult); the difference is scope.
`GET /api/endpoints/metrics?uid=
Metrics for a single endpoint, identified by its UID.
GET /api/endpoints/metrics/all
Aggregated metrics across all endpoints. Use this for top-level dashboards; use the per-endpoint route to drill in.
Response shape
{
"RPS": [ 1.2, 0.9, 1.4, … ],
"LatencyP95": [ 142, 138, 151, … ],
"ErrorRates": [ 0.0, 0.0, 0.01, … ],
"UniqueUsers": [ 12, 11, 13, … ],
"TotalCallsLastHour": 4321,
"TotalErrorsLastHour": 7,
"AverageLatencyLastHour": 124.5,
"AggregatedQueryTracker": {
"TouchedNodes": 192384,
"TouchedEdges": 478211,
"SimilarNodes": 12044,
"Queries": 4321
}
}
| Field | Type | Units / meaning |
|---|---|---|
RPS |
float[] |
Requests per second per bucket. Default bucket width is 1 minute, window 1 hour. |
LatencyP95 |
float[] |
95th-percentile latency per bucket, in milliseconds. |
ErrorRates |
float[] |
Fraction [0, 1] of requests that errored per bucket. |
UniqueUsers |
int[] |
Distinct authenticated users per bucket. |
TotalCallsLastHour |
float |
Total call count over the window. |
TotalErrorsLastHour |
float |
Total errored calls over the window. |
AverageLatencyLastHour |
float |
Mean latency (ms) over the window. |
AggregatedQueryTracker |
object | Graph-query workload summary (see below). |
The four arrays are aligned — index i of RPS, LatencyP95, ErrorRates, and UniqueUsers describes the same bucket.
EndpointQueryTrackerStats
Workload counters for graph and similarity queries that the endpoint executed during the window.
| Field | Type | Meaning |
|---|---|---|
TouchedNodes |
long |
Nodes visited by graph traversals. |
TouchedEdges |
long |
Edges visited. |
SimilarNodes |
long |
Nodes returned from vector retrieval. |
Queries |
long |
Total graph queries executed (one endpoint call can run many). |
These four counters are the early-warning signal for endpoints that scan too much of the graph. A high TouchedNodes-to-Queries ratio usually means a missing Take(...) or a StartAt(type) that should be StartAt(type, key).
Chat AI tool metrics
Tools registered with the chat AI surface get the same metric shape, exposed under their own routes.
`GET /api/chatai/tools/metrics?toolUID=
Metrics for a single chat-AI tool.
GET /api/chatai/tools/metrics/all
Aggregated metrics across every tool the chat AI can call.
Response shape is identical to EndpointMetricsResult above — slow or flaky tools degrade the entire chat experience, so plot these alongside endpoint metrics.
Authentication and scope
All metrics routes require an admin-scoped token. See Token scopes. External callers should rotate the token through a secret manager and not hard-code it in dashboards.
Sampling and retention
- Metrics buckets are produced live in memory; the workspace retains the last rolling window (default 1 hour) at full resolution.
- For longer retention, scrape the routes on your own interval and store the snapshots in your monitoring backend.
- The
Aggregated*route is computed on demand by summing the underlying per-endpoint counters — it's safe to poll but not free.
Wiring into monitoring
Prometheus
There is no built-in Prometheus exporter — write a small adapter that polls /api/endpoints/metrics/all once per minute and translates the response into gauge/counter metrics.
import requests, time
from prometheus_client import Gauge, start_http_server
calls = Gauge("curiosity_endpoint_calls_last_hour", "")
errors = Gauge("curiosity_endpoint_errors_last_hour", "")
latency = Gauge("curiosity_endpoint_avg_latency_ms_last_hour", "")
start_http_server(9100)
while True:
r = requests.get("https://workspace/api/endpoints/metrics/all",
headers={"Authorization": f"Bearer {TOKEN}"}).json()
calls.set(r["TotalCallsLastHour"])
errors.set(r["TotalErrorsLastHour"])
latency.set(r["AverageLatencyLastHour"])
time.sleep(60)
Datadog / Grafana / OpenTelemetry
Any HTTP-pull integration works the same way. Polish the bucket arrays (RPS, LatencyP95, …) into time-series points by emitting (now - i * 60s, value) pairs.
Suggested alerts
- Error budget burn.
ErrorRatesmean over the last 15 minutes > 1% on any endpoint. - Latency regression.
LatencyP95mean over the last 15 minutes exceeds a per-endpoint SLO (set during rollout). - Runaway query.
AggregatedQueryTracker.TouchedNodesper call ratio increases by >2× week-over-week — usually a sign someone removed aTake(...)or widened aStartAt. - Tool failure rate.
TotalErrorsLastHour / TotalCallsLastHouronchatai/tools/metrics/all> 5%.
Related pages
- Administration → Monitoring — built-in dashboards.
- Custom endpoints — what produces the metrics.
- AI tools — what the chat AI calls.
- Performance tuning — what to do when alerts fire.