Curiosity

Deploying on Kubernetes

Kubernetes is the recommended platform for production Curiosity Workspace deployments. The image is the same one used in Docker; what changes is how persistence, secrets, and traffic ingress are wired up.

This page walks through a complete StatefulSet-based deployment with TLS at an Ingress, plus options for Helm and Kustomize.

Why a StatefulSet

Workspace is a stateful single-writer application. It holds the graph in memory and persists it to disk under MSK_GRAPH_STORAGE. A StatefulSet gives you:

  • a stable pod name (curiosity-0) you can reference in backups and runbooks;
  • a PersistentVolumeClaim that follows the pod across reschedules;
  • ordered, controlled restarts on updates.

Deployment + ReadWriteOnce PVC also works for a single replica, but StatefulSet is the convention.

Prerequisites

  • A working Kubernetes cluster (kubectl context configured).
  • A storage class that supports ReadWriteOnce block storage. AWS EBS, Azure Disk, GCP Persistent Disk all work.
  • An Ingress controller (NGINX, Traefik, ALB, AGIC, etc.) and a way to terminate TLS in front of the workspace.
  • A secret store (sealed secrets, External Secrets Operator, Vault Agent, AWS Secrets Manager integration, …).
  • 16+ GB RAM available on the target node, ideally with anti-affinity rules to keep the pod off noisy neighbors.

Complete manifest

This manifest deploys one Workspace pod with persistent storage, a Service, an Ingress with TLS, secrets, healthchecks, and resource requests. Save it as curiosity.yaml and apply with kubectl apply -f.

apiVersion: v1
kind: Namespace
metadata:
  name: curiosity
---
apiVersion: v1
kind: Secret
metadata:
  name: curiosity-secrets
  namespace: curiosity
type: Opaque
stringData:
  MSK_ADMIN_PASSWORD: <generate-with-secret-store>
  MSK_JWT_KEY:        <32+-bytes-from-secret-store>
  MSK_GRAPH_MASTER_KEY: <32+-bytes-from-secret-store>
  MSK_LICENSE:        <your-license-token>
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: curiosity-config
  namespace: curiosity
data:
  MSK_GRAPH_STORAGE:        "/data/curiosity"
  MSK_GRAPH_BACKUP_FOLDER:  "/data/backups"
  MSK_GRAPH_JOURNAL_FOLDER: "/data/journal"
  MSK_PUBLIC_ADDRESS:       "https://workspace.example.com"
  MSK_USE_HSTS:             "true"
  MSK_REDIRECT_TO_HTTPS:    "false"   # TLS is terminated at the Ingress
---
apiVersion: v1
kind: Service
metadata:
  name: curiosity
  namespace: curiosity
spec:
  selector:
    app: curiosity
  ports:
    - name: http
      port: 8080
      targetPort: 8080
  clusterIP: None     # headless: gives the pod a stable DNS name
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: curiosity
  namespace: curiosity
spec:
  serviceName: curiosity
  replicas: 1
  podManagementPolicy: OrderedReady
  updateStrategy:
    type: OnDelete         # require an explicit delete to roll the pod
  selector:
    matchLabels:
      app: curiosity
  template:
    metadata:
      labels:
        app: curiosity
    spec:
      terminationGracePeriodSeconds: 120
      securityContext:
        runAsUser: 10000
        runAsGroup: 10000
        fsGroup: 10000
        runAsNonRoot: true
      containers:
        - name: curiosity
          image: curiosityai/curiosity:v1.42.0
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
          envFrom:
            - configMapRef:
                name: curiosity-config
            - secretRef:
                name: curiosity-secrets
          volumeMounts:
            - name: data
              mountPath: /data
          resources:
            requests:
              cpu: "4"
              memory: 16Gi
            limits:
              cpu: "8"
              memory: 32Gi
          readinessProbe:
            httpGet:
              path: /api/login/check
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            failureThreshold: 6
          livenessProbe:
            httpGet:
              path: /api/login/check
              port: http
            initialDelaySeconds: 120
            periodSeconds: 30
            failureThreshold: 5
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: gp3           # change per platform
        resources:
          requests:
            storage: 200Gi
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: curiosity
  namespace: curiosity
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
spec:
  ingressClassName: nginx
  tls:
    - hosts: [workspace.example.com]
      secretName: curiosity-tls
  rules:
    - host: workspace.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: curiosity
                port:
                  number: 8080

Apply, then watch readiness:

kubectl apply -f curiosity.yaml
kubectl -n curiosity rollout status statefulset/curiosity
kubectl -n curiosity get pods,svc,ingress

Tuning per platform

Platform Recommended storageClassName Notes
AWS EKS (EC2 nodes) gp3 (EBS CSI) Provision the EBS CSI driver. See AWS.
AWS EKS (Fargate) Fargate has no EBS — fall back to EFS (ReadWriteMany) and a Deployment.
Azure AKS managed-csi Default storage class works. See Azure.
Google GKE premium-rwo or standard-rwo Use the regional disk classes for HA. See GCP.
OpenShift ocs-storagecluster-ceph-rbd (or platform default) Add an appropriate SecurityContextConstraint. See OpenShift.

Helm and Kustomize

Curiosity does not yet ship an official Helm chart. Two community-friendly approaches:

  • Kustomize — keep the manifest above as a base, and overlay per environment with kustomization.yaml patches that change image:, replicas:, secrets, and resource limits.
  • Helm — wrap the manifest in a chart of your own. The variables that change between environments are: image tag, storage class and size, ingress hostname, secret references, resource requests/limits.

Operations

  • Backups: snapshot the PVC via your CSI's VolumeSnapshot or rely on MSK_GRAPH_BACKUP_FOLDER written to a separate PVC that you can replicate off-cluster. See Backup and restore.
  • Logs: the container writes to stdout/stderr; collect via your platform's aggregator (CloudWatch, Stackdriver, ELK).
  • Metrics: per-endpoint and per-tool metrics are available via /api/endpoints/metrics and /api/chatai/tools/metrics (admin-scoped). See Monitoring.
  • Scaling: the Workspace runs as a single writer per graph. Scale up the pod (more CPU/RAM) before scaling out.

Upgrades

Roll the pod by setting a new image tag and deleting the existing pod (because updateStrategy: OnDelete):

kubectl -n curiosity set image statefulset/curiosity \
  curiosity=curiosityai/curiosity:v1.43.0
kubectl -n curiosity delete pod curiosity-0

The new pod starts, attaches the existing PVC, and replays its journal before becoming ready. Always take a snapshot first and follow Upgrades and migrations.

Common pitfalls

  • ReadWriteMany storage with this manifest — the workspace is a single writer. Pin to ReadWriteOnce block storage unless you're explicitly running on a shared filesystem like EFS, and even then use a Deployment with 1 replica.
  • PVC that's too small — graph storage grows with data + indexes + embeddings. Resize the PVC ahead of capacity events; expanding a PVC online requires the storage class to allow it (allowVolumeExpansion: true).
  • Missing MSK_PUBLIC_ADDRESS — when behind an Ingress, generated links (SSO callbacks, email links) will be wrong without this set.
  • Aggressive liveness probe — boot can take a minute or two on cold caches. Keep initialDelaySeconds generous (120s) to avoid restart loops.
  • RollingUpdate with one replica — the new pod can't attach to the PVC while the old one holds it. Use OnDelete or set maxUnavailable: 1.

See also

© 2026 Curiosity. All rights reserved.
Powered by Neko