Curiosity - Metrics Reference

Metrics reference

The workspace exposes per-endpoint and per-tool metrics over HTTP so you can wire them into your monitoring stack (Prometheus, Datadog, Grafana, the workspace's built-in admin views, etc.).

This page lists the routes and the exact response shapes. For configuring dashboards, see Administration → Monitoring.

Endpoint metrics

Every custom endpoint produces request-rate, latency, error, and query-tracker counters. Both routes return the same response shape (EndpointMetricsResult); the difference is scope.

`GET /api/endpoints/metrics?uid=

Metrics for a single endpoint, identified by its UID.

`GET /api/endpoints/metrics/all`

Aggregated metrics across all endpoints. Use this for top-level dashboards; use the per-endpoint route to drill in.

Response shape

{
  "RPS":                     [ 1.2, 0.9, 1.4, … ],
  "LatencyP95":              [ 142, 138, 151, … ],
  "ErrorRates":              [ 0.0, 0.0, 0.01, … ],
  "UniqueUsers":             [ 12, 11, 13, … ],
  "TotalCallsLastHour":      4321,
  "TotalErrorsLastHour":     7,
  "AverageLatencyLastHour":  124.5,
  "AggregatedQueryTracker": {
    "TouchedNodes":  192384,
    "TouchedEdges":  478211,
    "SimilarNodes":  12044,
    "Queries":       4321
  }
}

Field	Type	Units / meaning
`RPS`	`float[]`	Requests per second per bucket. Default bucket width is 1 minute, window 1 hour.
`LatencyP95`	`float[]`	95th-percentile latency per bucket, in milliseconds.
`ErrorRates`	`float[]`	Fraction `[0, 1]` of requests that errored per bucket.
`UniqueUsers`	`int[]`	Distinct authenticated users per bucket.
`TotalCallsLastHour`	`float`	Total call count over the window.
`TotalErrorsLastHour`	`float`	Total errored calls over the window.
`AverageLatencyLastHour`	`float`	Mean latency (ms) over the window.
`AggregatedQueryTracker`	object	Graph-query workload summary (see below).

The four arrays are aligned — index i of RPS, LatencyP95, ErrorRates, and UniqueUsers describes the same bucket.

`EndpointQueryTrackerStats`

Workload counters for graph and similarity queries that the endpoint executed during the window.

Field	Type	Meaning
`TouchedNodes`	`long`	Nodes visited by graph traversals.
`TouchedEdges`	`long`	Edges visited.
`SimilarNodes`	`long`	Nodes returned from vector retrieval.
`Queries`	`long`	Total graph queries executed (one endpoint call can run many).

These four counters are the early-warning signal for endpoints that scan too much of the graph. A high TouchedNodes-to-Queries ratio usually means a missing Take(...) or a StartAt(type) that should be StartAt(type, key).

Chat AI tool metrics

Tools registered with the chat AI surface get the same metric shape, exposed under their own routes.

`GET /api/chatai/tools/metrics?toolUID=

Metrics for a single chat-AI tool.

`GET /api/chatai/tools/metrics/all`

Aggregated metrics across every tool the chat AI can call.

Response shape is identical to EndpointMetricsResult above — slow or flaky tools degrade the entire chat experience, so plot these alongside endpoint metrics.

Authentication and scope

All metrics routes require an admin-scoped token. See Token scopes. External callers should rotate the token through a secret manager and not hard-code it in dashboards.

Sampling and retention

Metrics buckets are produced live in memory; the workspace retains the last rolling window (default 1 hour) at full resolution.
For longer retention, scrape the routes on your own interval and store the snapshots in your monitoring backend.
The Aggregated* route is computed on demand by summing the underlying per-endpoint counters — it's safe to poll but not free.

Wiring into monitoring

Prometheus

There is no built-in Prometheus exporter — write a small adapter that polls /api/endpoints/metrics/all once per minute and translates the response into gauge/counter metrics.

import requests, time
from prometheus_client import Gauge, start_http_server

calls   = Gauge("curiosity_endpoint_calls_last_hour", "")
errors  = Gauge("curiosity_endpoint_errors_last_hour", "")
latency = Gauge("curiosity_endpoint_avg_latency_ms_last_hour", "")

start_http_server(9100)

while True:
    r = requests.get("https://workspace/api/endpoints/metrics/all",
                     headers={"Authorization": f"Bearer {TOKEN}"}).json()
    calls.set(r["TotalCallsLastHour"])
    errors.set(r["TotalErrorsLastHour"])
    latency.set(r["AverageLatencyLastHour"])
    time.sleep(60)

Datadog / Grafana / OpenTelemetry

Any HTTP-pull integration works the same way. Polish the bucket arrays (RPS, LatencyP95, …) into time-series points by emitting (now - i * 60s, value) pairs.

Suggested alerts

Error budget burn. ErrorRates mean over the last 15 minutes > 1% on any endpoint.
Latency regression. LatencyP95 mean over the last 15 minutes exceeds a per-endpoint SLO (set during rollout).
Runaway query. AggregatedQueryTracker.TouchedNodes per call ratio increases by >2× week-over-week — usually a sign someone removed a Take(...) or widened a StartAt.
Tool failure rate. TotalErrorsLastHour / TotalCallsLastHour on chatai/tools/metrics/all > 5%.

Administration → Monitoring — built-in dashboards.
Custom endpoints — what produces the metrics.
AI tools — what the chat AI calls.
Performance tuning — what to do when alerts fire.