# Architecture

# Architecture

Curiosity Workspace is a single product that brings together three layers that are often separate in modern data stacks:

  • Graph layer: stores a knowledge graph (nodes + edges) with schemas, properties, and traversals.
  • Search layer: provides text retrieval, ranking, filtering, and query-time constraints.
  • AI layer: embeddings, NLP extraction, and LLM-driven features that use graph + search as grounding.

The platform is designed so these layers reinforce each other:

  • Graph relationships improve navigation, filtering, and context building
  • Search provides fast retrieval and ranking at scale
  • AI adds semantic recall (embeddings) and reasoning/synthesis (LLMs) where appropriate

# System Internals

Under the hood, Curiosity Workspace operates as a highly optimized, single-deployment system designed for performance and ease of management.

# High-Performance Graph Engine

At the core is a purpose-built in-memory graph engine. It is optimized for low-latency traversals and high-throughput updates.

  • Efficient Storage: Data structures are packed for memory efficiency.
  • ACID Transactions: Updates to the graph are atomic and consistent, ensuring data integrity even during massive ingestion workloads.
  • Lock-Free Reads: Read operations are designed to be non-blocking, allowing heavy query loads to coexist with real-time ingestion.

# Ingestion and Indexing Pipeline

Data ingestion is handled by an asynchronous background process that separates parsing from linking:

  1. Natural language processing parsing: Incoming files and data streams are processed by Parsers to extract text and metadata, creating intermediate Document nodes.
  2. Linking: The Linker process analyzes these documents to materialize edges, connecting entities based on content, references, and heuristics.
  3. Indexing: Text and vectors are automatically indexed, making new data searchable immediately after processing.

# Security and Access Control (ReBAC)

Curiosity implements a robust Relationship-Based Access Control (ReBAC) model. Unlike traditional role-based lists, permissions in Curiosity are defined by graph paths.

  • Access is determined by the relationship between a user or team node and the target data.
  • This access model is highly customizable, allowing organizations to model complex permission structures.
  • Curiosity enforces these checks automatically and universally across all interactions, including direct queries, search results, and AI-generated responses.

# Core building blocks

  • Workspace
    • an environment that contains data, configuration, and extensibility artifacts
  • Schemas
    • node schemas and edge schemas define the types and constraints of the graph
  • Indexes
    • text indexes and embedding indexes over selected fields
  • Pipelines
    • NLP processing that turns text into structured signals and graph links
  • Extensibility
    • custom endpoints (server-side logic)
    • custom interfaces (front-end apps)

# Deployment Model

Curiosity Workspace is designed to be deployed as a self-contained unit, simplifying operations compared to distributed microservices.

  • Single-Container Architecture: The core services (Graph, Search, API, UI) run within a single process inside the container. This eliminates network overhead between components and simplifies debugging.
  • Sandboxed Processing: Only file processing tasks (such as parsing complex document formats) are performed out-of-process in a secure, sandboxed environment (using Landlock on Linux) to ensure system stability and security.
  • Storage: Data is persisted to a local volume. For cloud deployments (Kubernetes), this leverages standard CSI drivers (e.g., EBS for AWS, Azure Disk, Persistent Disk for GCP) to attach reliable block storage.
  • Scalability: While the architecture is monolithic, it is designed to scale vertically. The efficient memory model allows it to handle millions of nodes and edges on standard hardware.

# Technology Stack

Curiosity Workspace is built on a modern, high-performance stack:

  • Runtime: Utilizes the latest .NET runtime for high-throughput execution and cross-platform compatibility.
  • Storage Engine: Leverages RocksDB for efficient, reliable on-disk storage of graph data and indexes.

# Typical request flows

# Search and discovery flow

  1. A user searches for a term (keywords and/or semantic query).
  2. The search engine retrieves candidates (text and/or vector).
  3. The system applies filters (properties and/or graph-related facets).
  4. Results are ranked and returned; graph neighbors can be fetched for previews and navigation.

# AI-assisted flow (retrieval + reasoning)

  1. A user asks a question or starts a workflow.
  2. The system retrieves grounding context from search/graph.
  3. The LLM generates a response using that context.
  4. Optional: the result is saved back into the workspace (notes, links, tags) via endpoints/tasks.

# Design goals

  • Schema-first clarity: you control what types exist and how they relate.
  • Configurable retrieval: tune relevance without rewriting your app.
  • Safe extensibility: move business logic into versionable endpoints and controlled interfaces.
  • Operational control: deployments can be monitored, secured, and promoted across environments.

# Next steps

  • Understand how data moves through the system: Data Flow
  • Learn the foundational data structure: Graph Model