# Architecture

Curiosity Workspace is a single product that brings together three layers that are often separate in modern data stacks:

Graph layer: stores a knowledge graph (nodes + edges) with schemas, properties, and traversals.
Search layer: provides text retrieval, ranking, filtering, and query-time constraints.
AI layer: embeddings, NLP extraction, and LLM-driven features that use graph + search as grounding.

The platform is designed so these layers reinforce each other:

Graph relationships improve navigation, filtering, and context building
Search provides fast retrieval and ranking at scale
AI adds semantic recall (embeddings) and reasoning/synthesis (LLMs) where appropriate

# System Internals

Under the hood, Curiosity Workspace operates as a highly optimized, single-deployment system designed for performance and ease of management.

# High-Performance Graph Engine

At the core is a purpose-built in-memory graph engine. It is optimized for low-latency traversals and high-throughput updates.

Efficient Storage: Data structures are packed for memory efficiency.
ACID Transactions: Updates to the graph are atomic and consistent, ensuring data integrity even during massive ingestion workloads.
Lock-Free Reads: Read operations are designed to be non-blocking, allowing heavy query loads to coexist with real-time ingestion.

# Ingestion and Indexing Pipeline

Data ingestion is handled by an asynchronous background process that separates parsing from linking:

Natural language processing parsing: Incoming files and data streams are processed by Parsers to extract text and metadata, creating intermediate Document nodes.
Linking: The Linker process analyzes these documents to materialize edges, connecting entities based on content, references, and heuristics.
Indexing: Text and vectors are automatically indexed, making new data searchable immediately after processing.

# Security and Access Control (ReBAC)

Curiosity implements a robust Relationship-Based Access Control (ReBAC) model. Unlike traditional role-based lists, permissions in Curiosity are defined by graph paths.

Access is determined by the relationship between a user or team node and the target data.
This access model is highly customizable, allowing organizations to model complex permission structures.
Curiosity enforces these checks automatically and universally across all interactions, including direct queries, search results, and AI-generated responses.

# Core building blocks

Workspace
- an environment that contains data, configuration, and extensibility artifacts
Schemas
- node schemas and edge schemas define the types and constraints of the graph
Indexes
- text indexes and embedding indexes over selected fields
Pipelines
- NLP processing that turns text into structured signals and graph links
Extensibility
- custom endpoints (server-side logic)
- custom interfaces (front-end apps)

# Deployment Model

Curiosity Workspace is designed to be deployed as a self-contained unit, simplifying operations compared to distributed microservices.

Single-Container Architecture: The core services (Graph, Search, API, UI) run within a single process inside the container. This eliminates network overhead between components and simplifies debugging.
Sandboxed Processing: Only file processing tasks (such as parsing complex document formats) are performed out-of-process in a secure, sandboxed environment (using Landlock on Linux) to ensure system stability and security.
Storage: Data is persisted to a local volume. For cloud deployments (Kubernetes), this leverages standard CSI drivers (e.g., EBS for AWS, Azure Disk, Persistent Disk for GCP) to attach reliable block storage.
Scalability: While the architecture is monolithic, it is designed to scale vertically. The efficient memory model allows it to handle millions of nodes and edges on standard hardware.

# Technology Stack

Curiosity Workspace is built on a modern, high-performance stack:

Runtime: Utilizes the latest .NET runtime for high-throughput execution and cross-platform compatibility.
Storage Engine: Leverages RocksDB for efficient, reliable on-disk storage of graph data and indexes.

# Typical request flows

# Search and discovery flow

A user searches for a term (keywords and/or semantic query).
The search engine retrieves candidates (text and/or vector).
The system applies filters (properties and/or graph-related facets).
Results are ranked and returned; graph neighbors can be fetched for previews and navigation.

# AI-assisted flow (retrieval + reasoning)

A user asks a question or starts a workflow.
The system retrieves grounding context from search/graph.
The LLM generates a response using that context.
Optional: the result is saved back into the workspace (notes, links, tags) via endpoints/tasks.

# Design goals

Schema-first clarity: you control what types exist and how they relate.
Configurable retrieval: tune relevance without rewriting your app.
Safe extensibility: move business logic into versionable endpoints and controlled interfaces.
Operational control: deployments can be monitored, secured, and promoted across environments.

# Next steps

Understand how data moves through the system: Data Flow
Learn the foundational data structure: Graph Model

# Architecture

# Architecture

# System Internals

# High-Performance Graph Engine

# Ingestion and Indexing Pipeline

# Security and Access Control (ReBAC)

# Core building blocks

# Deployment Model

# Technology Stack

# Typical request flows

# Search and discovery flow

# AI-assisted flow (retrieval + reasoning)

# Design goals

# Next steps

See also