#
Architecture
#
Architecture
Curiosity Workspace is a single product that brings together three layers that are often separate in modern data stacks:
- Graph layer: stores a knowledge graph (nodes + edges) with schemas, properties, and traversals.
- Search layer: provides text retrieval, ranking, filtering, and query-time constraints.
- AI layer: embeddings, NLP extraction, and LLM-driven features that use graph + search as grounding.
The platform is designed so these layers reinforce each other:
- Graph relationships improve navigation, filtering, and context building
- Search provides fast retrieval and ranking at scale
- AI adds semantic recall (embeddings) and reasoning/synthesis (LLMs) where appropriate
#
System Internals
Under the hood, Curiosity Workspace operates as a highly optimized, single-deployment system designed for performance and ease of management.
#
High-Performance Graph Engine
At the core is a purpose-built in-memory graph engine. It is optimized for low-latency traversals and high-throughput updates.
- Efficient Storage: Data structures are packed for memory efficiency.
- ACID Transactions: Updates to the graph are atomic and consistent, ensuring data integrity even during massive ingestion workloads.
- Lock-Free Reads: Read operations are designed to be non-blocking, allowing heavy query loads to coexist with real-time ingestion.
#
Ingestion and Indexing Pipeline
Data ingestion is handled by an asynchronous background process that separates parsing from linking:
- Natural language processing parsing: Incoming files and data streams are processed by Parsers to extract text and metadata, creating intermediate Document nodes.
- Linking: The Linker process analyzes these documents to materialize edges, connecting entities based on content, references, and heuristics.
- Indexing: Text and vectors are automatically indexed, making new data searchable immediately after processing.
#
Security and Access Control (ReBAC)
Curiosity implements a robust Relationship-Based Access Control (ReBAC) model. Unlike traditional role-based lists, permissions in Curiosity are defined by graph paths.
- Access is determined by the relationship between a user or team node and the target data.
- This access model is highly customizable, allowing organizations to model complex permission structures.
- Curiosity enforces these checks automatically and universally across all interactions, including direct queries, search results, and AI-generated responses.
#
Core building blocks
- Workspace
- an environment that contains data, configuration, and extensibility artifacts
- Schemas
- node schemas and edge schemas define the types and constraints of the graph
- Indexes
- text indexes and embedding indexes over selected fields
- Pipelines
- NLP processing that turns text into structured signals and graph links
- Extensibility
- custom endpoints (server-side logic)
- custom interfaces (front-end apps)
#
Deployment Model
Curiosity Workspace is designed to be deployed as a self-contained unit, simplifying operations compared to distributed microservices.
- Single-Container Architecture: The core services (Graph, Search, API, UI) run within a single process inside the container. This eliminates network overhead between components and simplifies debugging.
- Sandboxed Processing: Only file processing tasks (such as parsing complex document formats) are performed out-of-process in a secure, sandboxed environment (using Landlock on Linux) to ensure system stability and security.
- Storage: Data is persisted to a local volume. For cloud deployments (Kubernetes), this leverages standard CSI drivers (e.g., EBS for AWS, Azure Disk, Persistent Disk for GCP) to attach reliable block storage.
- Scalability: While the architecture is monolithic, it is designed to scale vertically. The efficient memory model allows it to handle millions of nodes and edges on standard hardware.
#
Technology Stack
Curiosity Workspace is built on a modern, high-performance stack:
- Runtime: Utilizes the latest .NET runtime for high-throughput execution and cross-platform compatibility.
- Storage Engine: Leverages RocksDB for efficient, reliable on-disk storage of graph data and indexes.
#
Typical request flows
#
Search and discovery flow
- A user searches for a term (keywords and/or semantic query).
- The search engine retrieves candidates (text and/or vector).
- The system applies filters (properties and/or graph-related facets).
- Results are ranked and returned; graph neighbors can be fetched for previews and navigation.
#
AI-assisted flow (retrieval + reasoning)
- A user asks a question or starts a workflow.
- The system retrieves grounding context from search/graph.
- The LLM generates a response using that context.
- Optional: the result is saved back into the workspace (notes, links, tags) via endpoints/tasks.
#
Design goals
- Schema-first clarity: you control what types exist and how they relate.
- Configurable retrieval: tune relevance without rewriting your app.
- Safe extensibility: move business logic into versionable endpoints and controlled interfaces.
- Operational control: deployments can be monitored, secured, and promoted across environments.
#
Next steps
- Understand how data moves through the system: Data Flow
- Learn the foundational data structure: Graph Model