Curiosity - Deployment

Deployment

This page is the production-readiness checklist for a Curiosity Workspace deployment. It assumes you've already chosen a platform — for the platform-specific manifests, see Installation, Docker, Kubernetes, and the cloud guides.

Deployment goals

Reliability: predictable uptime, fast recovery.
Security: TLS, secrets discipline, scoped tokens, ReBAC enforced everywhere.
Scalability: handle data growth and query load.
Reproducibility: dev → staging → prod promotion is mechanical, not heroic.

Environments

Maintain three named environments with parity in shape but different scale and access:

Environment	Purpose	Notes
Dev	Engineer-owned, may be on a laptop	Local Docker run, default ports, generated admin password.
Staging	Pre-production validation	Prod-shaped manifest, smaller capacity, isolated secrets, restore drills allowed.
Production	Real users	Restricted access, change control, no shell access by default.

Promotion path: code/config changes land in source → tested in dev → deployed to staging → validated → promoted to prod.

What to version and promote

Treat these as deployable artifacts and version-control them:

Connector code (your data ingestion programs).
Custom endpoint and AI tool code (export from the workspace UI, store in git, redeploy on promotion).
Custom interface bundles (Tesserae / H5).
Schema migrations and ingestion pipeline definitions.
Search index configuration (indexed fields, boosts, facets).
NLP pipeline configuration (entity capture, embeddings field selection).
The deployment manifest itself (Docker Compose / Kubernetes / Helm / Terraform).

The Workspace stores all UI-managed configuration inside the graph, so a configuration export + import lets you snapshot and promote workspaces. See Backup and restore.

Production checklist

Before flipping a workspace to "production":

Image and runtime

Versioned image tag (curiosityai/curiosity:vX.Y.Z), not :latest.
Container memory and CPU sized for embeddings (start at 16 GB / 8 vCPU; bigger for large corpora).
Healthcheck on /api/login/check.
terminationGracePeriodSeconds ≥ 60 so the workspace can flush before being killed.

Storage

Persistent volume on SSD-backed block storage attached to MSK_GRAPH_STORAGE.
Separate volume (or directory) for MSK_GRAPH_BACKUP_FOLDER.
Backups scheduled, off-host, and tested by restoring to a sandbox.
Volume expansion enabled so you can grow without downtime.

Networking and TLS

TLS terminated at the proxy or inside the container; HSTS enabled.
MSK_PUBLIC_ADDRESS set to the user-facing URL.
No 0.0.0.0 exposure without an authenticating front-end.
Egress allowlist documented (Docker registry, NuGet, your LLM provider).

Identity and secrets

MSK_ADMIN_PASSWORD set explicitly (default admin/admin never used).
MSK_JWT_KEY set explicitly so tokens survive restarts.
MSK_GRAPH_MASTER_KEY set explicitly and backed up — losing it means losing encrypted content.
All secrets injected from a secret manager (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, Vault).
At least one SSO provider configured.
Admin sign-in via SSO only; the local admin account disabled after onboarding.

Permissions and tokens

Connectors run on dedicated tokens with ingestion scope only.
External integrations use endpoint tokens scoped to specific endpoints.
Token rotation documented and scheduled.

Observability

Stdout logs routed to your aggregator; audit log forwarded to your SIEM.
Alerts on liveness, latency regressions, ingestion failures, container restart rate.
Per-endpoint and per-tool metrics scraped into your monitoring system.

Disaster recovery

Documented RPO and RTO targets.
Restore drill completed within the past quarter.
Secrets manager backups verified.

See the per-page details: Security, Backup and restore, Monitoring, Upgrades and migrations.

Reverse proxy patterns

Most production deployments terminate TLS at a proxy and forward HTTP to the workspace. A minimal NGINX server block:

server {
    listen 443 ssl http2;
    server_name workspace.example.com;

    ssl_certificate     /etc/letsencrypt/live/workspace.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/workspace.example.com/privkey.pem;
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    client_max_body_size 100m;
    proxy_read_timeout    300s;

    location / {
        proxy_pass         http://127.0.0.1:8080;
        proxy_set_header   Host              $host;
        proxy_set_header   X-Real-IP         $remote_addr;
        proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;
    }
}

Set MSK_PUBLIC_ADDRESS=https://workspace.example.com on the workspace so generated links use the proxy's hostname.

Rolling out a change

Recommended sequence for a non-trivial production change (image upgrade, schema migration, new SSO provider, …):

Take a backup of the graph volume.
Apply the change in staging; walk the post-restore validation checklist in Backup and restore.
Promote to production during a low-traffic window.
Watch Monitoring for 30 minutes after the rollout.
Be prepared to roll back: revert the image tag (and configuration), restart, and restore the backup if data shape changed.

Next steps

Observe health and performance: Monitoring
Set permission boundaries early: Permissions
Plan for failure: Backup and restore, Troubleshooting
Run the right manifest for your platform: Installation