Curiosity - Installation

Installation

Curiosity Workspace ships as a single container image (curiosityai/curiosity) plus a Windows installer. This page helps you pick the right delivery model for your environment and links to the platform-specific guide that takes you the rest of the way.

For a runnable local install in under five minutes, jump to Quickstart. For a complete end-to-end developer build, see Build your first enterprise AI app.

Decision tree

If you need…	Use…
Local development on a laptop	Docker — a single `docker run` or `docker compose up`.
A demo or evaluation on a Windows VM	Windows installer. Easy to install/uninstall as a service.
A staging environment on a single VM	Docker with Compose, behind your existing reverse proxy.
A production deployment on Kubernetes	Kubernetes, and consult the cloud-specific notes below.
A production deployment on AWS	AWS — EC2/EKS, EBS, ALB.
A production deployment on Azure	Azure — VM/AKS, Azure Disk, Entra ID.
A production deployment on Google Cloud	GCP — Compute Engine/GKE, Persistent Disk.
A production deployment on OpenShift	OpenShift.
An air-gapped or on-prem deployment	Docker or Kubernetes with a private registry mirror.
To rebuild or audit Curiosity Workspace from source	Build from Source — replicate the production pipelines from the `curiosity-ai/mosaik` tree (password-protected).

Decisions you should make before installing

Environment tier

Local / staging / production — they each impose different defaults.

Local: bind to loopback, use generated admin password via MSK_ADMIN_PASSWORD, persistence on a real volume so you don't lose your work between restarts.
Staging: prod-shaped manifest, smaller capacity, isolated secrets, restore drills allowed.
Production: TLS, secrets manager, monitoring, backups, anti-affinity, anti-cohabitation with noisy neighbors. ===

Storage

The graph and its indexes are I/O sensitive. Always provision SSD or better.

Capacity = (sum of indexed text fields, in bytes) × ~1.5 + (embedded fields, in bytes) × (embedding dimensions × 4 / chunk size) + journal headroom.
A starter PVC of 200 GB is sufficient for hundreds of thousands of documents.
Pin a MSK_GRAPH_BACKUP_FOLDER to a different volume so a corrupted graph volume doesn't take backups with it. ===

Access model

Decide before opening the workspace to anyone.

Internal only: workspace reachable through a VPN or private network.
Public, but authenticating: TLS-terminated reverse proxy, SSO via your IdP, no admin/admin default.
See Security. ===

Identity

Plan how users will sign in before ingesting production data, so ACLs can be ingested against the right teams.

Local accounts for evaluation only.
One or more SSO providers (Microsoft Entra ID, Google, Okta, Auth0, SAML).
Map IdP groups to Workspace teams. ===

Observability and backups

Centralized logs (stdout collector, or MSK_LOG_PATH on a mounted volume).
Backups to off-host storage, with a documented restore drill. See Backup and restore.
Alerts on liveness, latency regressions, and ingestion failures. See Monitoring. ===

Prerequisites

CPU: 4 cores minimum (8+ recommended for production with embeddings).
RAM: 8 GB minimum (16 GB+ recommended; embeddings indexes are memory-resident).
Storage: SSD with enough space for graph + indexes + 1.5× headroom for backups.
Network: TCP 8080 (or your chosen MSK_PORT) reachable by clients. TLS terminated by a proxy or in-container.
License: a MSK_LICENSE token if you have a commercial license.
For AI features: an LLM/embedding provider key (OpenAI, Azure OpenAI, Anthropic, or a local OpenAI-compatible server).

First-boot checklist

After the service is running for the first time:

Open the UI and complete the setup wizard.
Rotate admin credentials if you didn't already set MSK_ADMIN_PASSWORD (never leave defaults in any environment beyond your laptop).
Set MSK_JWT_KEY explicitly so tokens survive restarts.
Set MSK_GRAPH_MASTER_KEY and back it up; you cannot decrypt content without it.
Configure SSO before inviting real users.
Create an API token for ingestion connectors and store it in a secret manager.
Confirm persistence by restarting the service and verifying the workspace state remains.

Post-install validation checklist

Web UI loads at your workspace URL (using your MSK_PUBLIC_ADDRESS from a client machine).
You can log in with an admin account — and your IdP, if you configured one.
TLS is correct end to end (browser shows the expected certificate; HSTS header present if enabled).
Storage is persistent across restarts.
Background tasks (indexing/parsing) can run.
Backup runs successfully and the resulting snapshot restores in a separate environment.
Logs reach your aggregator.
Monitoring shows the workspace as healthy.

Common installation pitfalls

Ephemeral storage: running with a non-persistent volume will lose data on restart.
Reverse proxies and origins: when behind a proxy, set MSK_PUBLIC_ADDRESS consistently; otherwise generated links (SSO callbacks, email links) will be wrong.
Ports and binding: confirm the service binds to the expected interface (127.0.0.1 for local, 0.0.0.0 for a proxy-fronted deployment).
:latest in production: pin to a versioned image tag (curiosityai/curiosity:vX.Y.Z) so upgrades are explicit.
Missing master key: encrypted properties can't be read after a restart if MSK_GRAPH_MASTER_KEY was autogenerated and then lost.

Next steps

Configure the workspace basics: Workspace Configuration
Walk an end-to-end build: Build your first enterprise AI app
Promote to production: Deployment checklist
Rebuild from source (advanced): Build from Source

Referenced by

Deployment