Curiosity - Security

Security

This page sets the operational security baseline for a Curiosity Workspace deployment. Treat it as the minimum to satisfy; align with your organization's specific policies on top.

Security domains

Domain	What it covers	Where to configure
Identity & auth	How users sign in, session lifetime, MFA	SSO providers
Authorization	What users can see and do once signed in	Permissions, Access Control Model
Data protection	Encryption in transit and at rest, secrets handling	This page + Configuration reference
Operational security	Patching, image hygiene, change control	Deployment, Upgrades
AI data handling	What leaves the workspace and reaches a model provider	This page + LLM Configuration
Audit & monitoring	Who did what, when	Monitoring

Baseline practices

TLS everywhere. Terminate TLS at a proxy or in-container; enable HSTS (MSK_USE_HSTS=true) and HTTP→HTTPS redirect (MSK_REDIRECT_TO_HTTPS=true).
No default credentials. Always set MSK_ADMIN_PASSWORD on first boot. After SSO is wired up, disable the local admin account.
Stable JWT key. Set MSK_JWT_KEY explicitly so tokens survive container restarts; rotate it deliberately, not accidentally.
At-rest encryption. Set MSK_GRAPH_MASTER_KEY and back it up. Without it, encrypted properties cannot be decrypted after a restore.
Secret manager for every MSK_* secret and every model-provider key (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, Vault).
Least privilege on tokens. Use scoped API tokens for connectors and endpoint tokens for external systems. Never share admin tokens.
Separate environments. Dev / staging / prod with distinct secrets and distinct JWT keys.
Stronger auth for admins. Require SSO with MFA for any account that has admin permissions.

Network posture

The workspace listens on a single HTTP port (MSK_PORT, default 8080). Terminate TLS in front of it.
Bind to 127.0.0.1 in local development and to the proxy network in production. Never bind directly to a public interface without an authenticating proxy.
Set MSK_PUBLIC_ADDRESS to the user-facing URL so generated links (SSO callbacks, email links) are correct.
Egress reaches your LLM/embedding provider. If your network policy requires it, route through MSK_HTTP_PROXY and document the allowlist.

Data protection

In transit: TLS to the workspace and to model providers (the workspace uses HTTPS when calling external APIs).
At rest: graph storage is encrypted with MSK_GRAPH_MASTER_KEY. Disk-level encryption (LUKS, EBS encryption, Azure Disk encryption, GCP CMEK) is recommended on top.
Backups: encrypt at the storage layer; replicate to a separate failure domain. See Backup and restore.
Sensitive properties: mark properties that must be encrypted on disk with the SDK's encryption attribute (where supported by your schema).

Token discipline

The three token types and their use cases are documented in Token scopes. Two rules that fail-closed:

Connectors get ingestion tokens. Never grant admin to a connector token.
External systems get endpoint tokens. Path-scoped tokens are much smaller blast radius than API tokens.

Rotate tokens on a schedule. Rotating MSK_JWT_KEY invalidates every outstanding token at once — use this only when responding to a key-material compromise.

AI / model-provider data handling

Every call to an embedding or chat provider sends a payload to that provider. The choices that matter:

Which fields are embedded. Configured under Settings → AI Settings → Embeddings. Don't embed fields that shouldn't leave the workspace.
Which provider. Hosted (OpenAI, Anthropic), regional (Azure OpenAI), or self-hosted (a local OpenAI-compatible server). Self-hosted gives you the strongest data-residency guarantees.
Provider data policies. Confirm in writing whether your provider trains on the data you send. Most enterprise tiers commit to no training.

The LLM Configuration page documents per-provider setup. For deeper context on what gets sent, see Multimodal Search (for OCR/STT payloads).

Endpoint security

Custom endpoints accept user input and run inside the workspace process. Defensive checklist:

Validate every input at the top of the endpoint (Body.FromJson<T>() only gets you typing; you still need to enforce limits, allowlists, and required fields).
Use CreateSearchAsUserAsync(..., CurrentUser, ...) instead of the system-scoped variant unless you're knowingly exposing data the caller shouldn't normally see.
Avoid SSRF: if an endpoint makes outbound HTTP calls, allowlist destinations.
Set request and response size limits. Don't return whole documents when a summary will do.
Return safe errors. Surface a traceId and a generic message to the caller; log the full exception on the server side.

See Custom Endpoints.

AI tool security

Tools run within the user's security context and can call out to the graph and search engines. Two specific rules:

Scope the tool to a single, clear intent. Vague tools are picked at the wrong times and end up exposing more than intended.
Use scope.CurrentUser for retrieval. Permission-aware retrieval is the only thing standing between a tool call and a data leak.

See AI Tools and LLM Agents.

Audit and incident response

Audit log forwarded to your SIEM (see Monitoring).
Documented response plan for: token compromise, key compromise (MSK_JWT_KEY, MSK_GRAPH_MASTER_KEY), unauthorized admin access, model-provider data leak.
Quarterly restore drill (see Backup and restore) — a recovery you've never tested is not a recovery.

Next steps

Configure authorization boundaries: Permissions and Access Control Model (deep dive).
Set up operational controls: Deployment.
Pick the right token type for each integration: Token scopes.