# Schema Design

# Schema Design

Schema design is the most important step in building a successful Curiosity Workspace application. It determines:

  • how users navigate and explore data
  • how search is scoped and filtered
  • how AI features can ground and enrich results
  • how connectors keep data consistent over time

# The three layers of a good schema

  • Entities (nodes): the “things” in your domain
  • Relationships (edges): the meaningful links between those things
  • Attributes (properties): the descriptive fields used for display, filtering, and retrieval

# Start from user journeys

Ask these questions before you write your first node schema:

  • What are the top 5 questions users ask?
  • What are the top 5 workflows users execute?
  • Which objects do they search for first?
  • What do they click on next?

Those answers typically map directly to:

  • primary node types
  • the edges between them
  • the filters and facets you must support

# Keys: pick stable identity early

For each node type, define a stable key:

  • Prefer stable IDs from the source system.
  • If not available, use a deterministic key strategy (canonicalization + hash).
  • Avoid random IDs unless you never need to re-run ingestion safely.

# When to make something a node vs a property

Use a property when:

  • the value is only displayed or filtered on the current node
  • you do not need to navigate to it as an entity

Use a node + edge when:

  • you need cross-cutting filters (e.g., status across multiple types)
  • you need navigation and context building (“show all tickets for this customer”)
  • the value should have its own metadata over time

# Relationship modeling patterns

Common patterns:

  • Ownership / membership: Customer -> HasTicket -> Ticket
  • Attribution as node: Ticket -> HasStatus -> Status
  • Mentions / linking: Document -> Mentions -> Entity
  • Bipartite linking: avoid duplicating properties by linking to shared nodes

# Schema evolution

Expect schema evolution in real systems:

  • add properties as new data becomes available
  • introduce new node/edge types for new workflows
  • backfill or reparse content when pipelines change

Operational advice:

  • version your connector logic and treat schema updates as deployments
  • plan reindex and reparse windows for large changes

# Next steps