Curiosity

Backup and restore

A Curiosity Workspace deployment has three things you need to be able to restore from cold storage:

  1. The graph — the typed nodes, edges, and indexes that live under MSK_GRAPH_STORAGE. This is the database.
  2. The workspace configuration — search indexes, NLP pipelines, embedding/LLM provider settings, SSO config, scheduled tasks, custom endpoints, AI tools. Stored inside the graph, so it's covered by the graph backup.
  3. The secretsMSK_ADMIN_PASSWORD, MSK_JWT_KEY, MSK_LICENSE, MSK_GRAPH_MASTER_KEY, model-provider API keys, certificate material. Stored outside the graph in your secret manager.

You need a working copy of all three to restore a Workspace from scratch.

Strategy at a glance

Tier Target RPO Target RTO Approach
Local dev 24h best-effort Periodic tar of MSK_GRAPH_STORAGE.
Staging 24h 1h Daily volume snapshot + secrets in a secret manager.
Production 1h 15 min Hourly snapshot + 15-min journal sync + redundant secrets manager + tested restore drill.

What to back up

The graph (MSK_GRAPH_STORAGE)

Snapshot the directory MSK_GRAPH_STORAGE points at. The graph supports lock-free reads, so a snapshot taken with a CSI volume snapshot, an EBS snapshot, or a filesystem-level snapshot (LVM, ZFS, Btrfs) is consistent.

If your platform doesn't support snapshots, the workspace can write a consistent point-in-time backup to MSK_GRAPH_BACKUP_FOLDER:

  1. Set MSK_GRAPH_BACKUP_FOLDER to a path that's mounted to durable storage (a separate volume, an S3 bucket via a filesystem driver, etc.).
  2. Schedule a backup task under Settings → Tasks with type Backup. Recommended frequency: hourly for production.
  3. Copy the resulting backup files off-host on a schedule.

The journal (MSK_GRAPH_JOURNAL_FOLDER)

If set, the journal contains the write log used to recover from crashes. Including it in your backup tightens your RPO between snapshots — restore can replay journal entries created after the last snapshot.

Secrets and configuration

The graph backup includes most workspace configuration. Outside of the graph, back up:

  • All MSK_* secretsMSK_ADMIN_PASSWORD, MSK_JWT_KEY, MSK_LICENSE, MSK_GRAPH_MASTER_KEY, and any provider-side secrets you reference via *_FILE variables.
  • TLS material if certificates are mounted into the container (MSK_CERT_FILE, MSK_CERT_FILE_PRIVATE_KEY).
  • The Docker/Helm/Compose manifest that runs the workspace, so the restored environment looks the same.
  • Custom interface and connector source code — these live in your own git repositories; make sure those are mirrored.

Daily backup procedure (containerized)

#!/usr/bin/env bash
set -euo pipefail

TS=$(date -u +%Y-%m-%dT%H%M%SZ)
DEST=/srv/backups/curiosity/$TS
mkdir -p "$DEST"

# 1) Quiesce nothing — graph snapshots are lock-free, but if you have heavy
#    write activity you may want to pause ingestion for the duration.
docker exec curiosity sync || true

# 2) Snapshot
tar -C /srv/curiosity -czf "$DEST/graph.tar.gz" curiosity

# 3) Capture secrets from the secret manager
vault kv get -format=json secret/curiosity > "$DEST/secrets.json"

# 4) Verify integrity
test -s "$DEST/graph.tar.gz" && test -s "$DEST/secrets.json"

# 5) Off-host
aws s3 cp --recursive "$DEST" "s3://my-curiosity-backups/$TS/"

For Kubernetes deployments, the equivalent is a VolumeSnapshot plus a Secret snapshot. For cloud-managed disks (EBS, Azure Disk, Persistent Disk), use the platform's snapshot API instead of tar.

Restore procedure

You're restoring three things in this order: secrets, graph storage, then the running workspace.

  1. Provision secrets in the destination secret manager, matching the names referenced by your manifest.
  2. Restore the graph volume:
    mkdir -p /srv/curiosity
    tar -C /srv/curiosity -xzf /srv/backups/curiosity/<timestamp>/graph.tar.gz
    
  3. Start the workspace with the same MSK_GRAPH_STORAGE path and the same secrets:
    docker run --name curiosity \
      -p 127.0.0.1:8080:8080 \
      -v /srv/curiosity:/data \
      -e MSK_GRAPH_STORAGE=/data/curiosity \
      -e MSK_GRAPH_MASTER_KEY="$(vault kv get -field=master_key secret/curiosity)" \
      -e MSK_JWT_KEY="$(vault kv get -field=jwt_key secret/curiosity)" \
      -e MSK_ADMIN_PASSWORD="$(vault kv get -field=admin_password secret/curiosity)" \
      curiosityai/curiosity:<same-version-as-source>
    
  4. Wait for startup — the workspace replays journal entries before accepting traffic. Watch docker logs -f curiosity.
Restore on the same Workspace version

Always restore onto the same container image version the backup was taken on, then upgrade afterward. Restoring across major versions is not supported.

Validating a restore

After restore, walk this checklist before declaring the environment ready:

  • Sign in works with the admin account from the backup.
  • Users and teams are present under Settings → Accounts.
  • Node counts match the source for each major type:
    return Q().StartAt("Ticket").Count();
    
  • Search returns results for a smoke query you know should match.
  • Vector search returns results — embeddings survived the snapshot.
  • SSO works — sign in via each configured provider.
  • Scheduled tasks are present and enabled at the expected cadence.
  • Custom endpoints compile and respond on POST /api/endpoints/run/<name>.
  • AI tools respond inside the chat view.

Restore drills

A restore you've never tested is not a restore. We recommend a quarterly drill in production:

  1. Spin up an isolated staging cluster.
  2. Restore the latest backup into it.
  3. Walk the validation checklist.
  4. Tear it down.

A documented, dated drill — even one page — is what auditors will ask for.

Cross-environment migration

The same procedure works for promoting data from staging to production, or for cloning production into a sandbox. Two caveats:

  • Re-encrypt secrets for the destination environment. Don't share MSK_JWT_KEY between environments — it would let a session token from one work in the other.
  • Strip personally identifiable information before cloning production into shared dev environments. The graph has no built-in PII redaction; you can run a one-off cleanup endpoint after the restore.

See also

© 2026 Curiosity. All rights reserved.
Powered by Neko