Curiosity - Property Graph Model

Property Graph Model

Curiosity Workspace stores data as a labeled property graph: a directed, typed multigraph in which both entities (nodes) and the relationships between them carry structured information. This page is the formal reference for the model — what is allowed, what is not, and what the engine guarantees.

For practical schema-design advice, see Graph Model and Schema Design. This page describes the model itself.

Assumptions and invariants

The graph engine enforces a small, deliberate set of rules. Everything else builds on top of them.

Single type per node. A node has exactly one type, fixed at creation. There is no multi-label set.
Single type per edge. Each edge has exactly one type, fixed at creation.
Edges are directed. Every edge has a source node and a target node; the two endpoints are not interchangeable. Bidirectional semantics are expressed by linking with two reciprocal edges (see Reciprocal edges).
Edges have no user-defined properties. Properties live on nodes. To attach data to a relationship, model it as an intermediate node.
Edges always have both endpoints. An edge cannot exist without an existing target node; orphaned edges are not representable.
Each node has one stable identity. A node's identifier is a 128-bit value (UID128) that is fixed for the lifetime of the node.
Each schema has one key field. The key, together with the node type, deterministically derives the node's identifier for externally-defined schemas.
Property keys are unique within a node. A node cannot carry two properties with the same name.
Timestamps are immutable. Nodes and edges record a creation timestamp that is set once and not modified.
Reads are lock-free; writes are ACID. A commit either applies in full or not at all. Two writers committing the same logical entity concurrently get one winner; the other receives a commit_aborted error and can retry.

Multigraph

Multiple edges of the same type between the same pair of nodes are allowed by default. Uniqueness is an application-level decision (see Edge uniqueness).

Core elements

flowchart LR subgraph N1[Node A: Customer] N1P["properties:<br/>Id, Name, Tier"] end subgraph N2[Node B: Ticket] N2P["properties:<br/>Id, Subject, Body, CreatedAt"] end N1 -- "HasTicket" --> N2 N2 -- "TicketOf" --> N1

A property graph in Curiosity is made of three things:

Element	What it is	What it carries
Node	An entity in the domain (`Customer`, `Ticket`, `Document`).	A type, a stable identifier, a creation timestamp, and a set of properties.
Edge	A directed, typed link from one node to another.	A type, the target node's identifier, and a creation timestamp. No user-defined properties.
Property	A typed field on a node.	A name and a value of one of the supported scalar, list, or dictionary types.

A path is an alternating sequence of nodes and edges that follows edge direction; its length is the number of edges traversed.

Internal vs. external schemas

Curiosity ships with a large set of built-in node and edge types that power core functionality (users, files, access control, NLP enrichment, notifications, and so on). Every workspace also defines its own application-specific types on top. The distinction matters operationally.

Internal schemas

Internal schemas are reserved for the platform itself. They follow a strict naming rule and a stricter lifecycle:

Names always start with an underscore (_) — for example, _User, _FileEntry, _AccessGroup, _Document, _Blob, _Message. This is enforced by the engine: an internal type whose name does not begin with _ is rejected at registration time.
They are protected from modification or deletion through the public API. The platform owns their structure; upgrades may add fields, but applications must not redefine them.
Their identifiers are server-assigned, not derived from a domain key. The deterministic-hash identity rule that applies to external schemas does not apply to internal ones.
Their display name may differ from their schema name. For example, the _AccessGroup node type is presented to end users as Team in the built-in UI.

Internal node types are organized by domain (access, files, NLP, notes, issues, calendar, CRM, messaging, automation, search, usage, and system) and there are ~100 of them in the current platform. Internal edge types follow the same _ convention and number ~150.

External (application-defined) schemas

External schemas are the ones you define for your domain — Customer, Ticket, Product, Policy, Document, and so on.

Names must not start with an underscore. The leading underscore is reserved for the platform.
You define them in code (for connectors) or register them dynamically. Each schema declares a name, a key field, and a set of typed fields.
Their identifiers are deterministic. The platform derives the node's UID128 from a 128-bit hash combining the schema's type identifier and the node's key value (see Identity and keys). This is what makes ingestion idempotent: re-running a connector against the same source produces the same node, not a duplicate.
They can evolve. New properties can be added; renames and removals require migration. See Schema Design → Schema evolution.

When to use which

Application logic should add nodes of external types and link them — to each other and, where appropriate, to internal types (for example, attaching a _FileEntry to a domain entity, or restricting a node's visibility to an _AccessGroup). Application logic should not create or modify internal-type nodes directly except through their published APIs.

Nodes

A node represents a single entity. It has:

A type — a non-empty string registered in the schema (Customer, _FileEntry, ...). The type is fixed for the lifetime of the node.
An identifier — a UID128, the node's permanent identity. Used by every edge that points to the node.
A creation timestamp — milliseconds since the epoch, set at creation, never updated.
Properties — zero or more typed fields (see Properties).
Outgoing edges — zero or more directed edges to other nodes.

Two nodes are equal if and only if they have the same identifier. Nodes are addressed by identifier in every internal API; lookup by (type, key) resolves to an identifier and then proceeds.

Edges

An edge is a directed link from one node to another. It has:

A type — a non-empty string registered in the schema. Application edge types use natural names (HasTicket, ForProduct, MentionsEntity); internal edge types follow the _ prefix rule (_OwnedBy, _MemberOf, _HasMember).
A target node — referenced by UID128. The source is implicit: the node that owns the edge.
A creation timestamp — set at creation, immutable.

Edges do not carry user-defined properties. This is a deliberate constraint that keeps traversals fast and edge storage compact. When you need to attach data to a relationship — a comment text, a score, a date range — model the relationship as a node and connect both endpoints to it.

Reciprocal edges

Edges are directed, but most relationships are useful to traverse from either side. The standard pattern is to create two edges, one in each direction, named to read naturally:

Customer --HasTicket--> Ticket --TicketOf--> Customer

Both edges are first-class. The engine does not infer one from the other — your connector creates both — but it provides a single linking call that emits the pair atomically.

Edge uniqueness

Whether duplicate edges are allowed is decided per write, not in the schema. A unique edge is identified by the triple (edge type, source node, target node); the two add paths differ in how they treat that triple:

Non-unique edge (AddEdge): every add creates a new edge, even if an identical one already exists. Multiple edges of the same type between the same pair of nodes are kept. Useful when each occurrence carries meaning — for example one edge per timestamped event modeled as a relationship node.
Unique edge (AddUniqueEdge): the add is idempotent. If an edge with the same type, source, and target already exists, the engine returns it instead of creating a duplicate. Useful for membership-style relationships ("user is in team") where the relationship is set-valued rather than event-valued.

The same source node can still hold many unique edges of one type to different targets — uniqueness is per target, not per type. Higher-level linking calls (the SDK Link(...) helpers) default to the unique path; opt into non-unique only when you need to retain repeated occurrences.

Properties

Properties are typed fields stored on nodes. They are addressed by name within a node, and a node cannot carry two properties with the same name.

Property value types

Curiosity supports a fixed set of property value types:

Family	Types
Strings	`string`
Booleans	`bool`
Integers	`byte`, `sbyte`, `short`, `ushort`, `int`, `uint`, `long`, `ulong`
Floating point	`float`, `double`, `decimal`
Characters	`char`
Identifiers	`UID128`
Temporal	`Time` (timestamp)
Geographic	`GeoPoint`
Linguistic	`Language`
Collections	`List<T>`, `Dictionary<K, V>`, `T[]` of any supported scalar

A few notes on the type system:

No nested objects. A property value is a scalar, a list of scalars, or a key/value dictionary of scalars. Complex shapes are modeled as their own nodes connected by edges.
Vectors are a distinct concept. Embedding vectors are stored as float[] on embedding-capable node types and used by the similarity indexes, but they are not exposed as ordinary properties for filtering or sorting.
Binary content is stored separately. Large blobs (file contents, images, attachments) are persisted in blob storage and referenced from nodes by a UID128 (typically pointing to a _Blob node).

Declared vs. dynamic fields

Properties are declared in the node schema. Each declared field has a name and a value type. The schema is the source of truth for:

which fields exist on a node type,
the type of each field,
which field is the key (see below).

Optionality and indexing are layered on top: search indexing, encrypted-at-rest properties, and access control are configured separately from the schema definition. See Search Model and Access Control Model.

Identity and keys

Every node has a 128-bit identifier (UID128). For external (application-defined) schemas, the identifier is derived deterministically from the node type and the value of the key field:

UID128 = hash128(typeName) ⊕ hash128(keyValue)

This has two important consequences:

Ingestion is idempotent. Re-running a connector with the same (type, key) resolves to the same node. New properties update in place; the identifier does not change.
External systems can compute the identifier offline. Given the type and key, any client can derive the UID128 without round-tripping to the server.

Rules and recommendations:

Each external schema must declare exactly one key field. Composite keys are not supported directly — pick one identity and model the rest as properties or edges.
Key values must be non-empty strings. Empty or null keys are rejected.
Pick a stable key. Source-system IDs are usually the best choice. Deterministic hashes of canonical records work, but any change to the canonicalization will recreate the affected nodes. See Schema Design → Keys.
Internal nodes do not follow this rule. Their identifiers are assigned by the server; computing them offline is not supported.

Labels and types

Some property graph models distinguish "labels" (a multi-valued set on nodes) from "types" (a single value on relationships). Curiosity uses a simpler model:

A node has one type. It is fixed at creation.
An edge has one type. It is fixed at creation.

Multi-label semantics — "this node is also a Premium and an EU-Customer" — are modeled by linking the node to category nodes via labeled edges:

Customer --HasLabel--> Label{Premium}
Customer --HasLabel--> Label{EU-Customer}

This shape gives you free faceting ("show all customers labeled Premium") and lets each label carry its own metadata (description, owner, lifecycle), which a tag list on the node would not.

Access control as graph edges

Permissions in Curiosity are themselves expressed in the graph. There is no separate "ACL property" you put on a node. Instead:

A user is a _User node.
A team is an _AccessGroup node (presented in the UI as Team).
Membership is an edge: _User --_MemberOf--> _AccessGroup (with the reciprocal _HasMember).
Ownership and visibility are edges: _OwnedBy, _Owns, _HasAdmin, _AdminOf.
Node-type-level and field-level restrictions reference an _AccessGroup: at query time, a user is allowed to read a restricted type or field only if they are a member of the configured group.

Because access control lives in the graph, the same traversal primitives that answer "what tickets belong to this customer" also answer "what teams can read this document". For the full data flow, see Access Control Model.

Paths and traversals

A path is an alternating sequence of nodes and edges that follows edge direction. Traversals are the primary way applications read the graph:

start from a known node or a node type,
step out of a node along one or more edge types (or in, to follow incoming edges),
filter by property, type, or timestamp,
return nodes, paths, neighbors, or aggregates.

Traversals are lock-free with respect to writes, so concurrent ingestion does not block reads. Search facets that filter by graph relationship (for example, "tickets whose customer is Acme") compile into traversal predicates evaluated alongside text and vector scoring.

What is intentionally not modeled

Knowing what the model does not support is as important as knowing what it does.

Properties on edges. Use a relationship node when you need them.
Multiple types per node. Use edges to category nodes for set-valued categorization.
Composite keys. Choose one identity per schema; model alternates as properties or edges.
Nested objects in properties. Promote the nested object to its own node type when the structure matters.
Undirected edges. Use reciprocal edges when both directions are meaningful.
Cross-workspace edges. A graph is the unit of consistency; edges always connect nodes within the same workspace.

Limits

Aspect	Limit
Distinct node types	16,777,215 (24-bit type identifier)
Distinct edge types	16,777,215 (24-bit type identifier)
Node identifier width	128 bits
Edge size on disk	32 bytes (packed)
Properties per node	bounded by the schema definition
Multiplicity of edges between two nodes	unbounded (multigraph)

Next steps

Apply the model to a concrete domain: Schema Design.
See how search and ranking compose on top of the graph: Search Model.
Understand how data moves through the system end-to-end: Data Flow.
Read the permission story in depth: Access Control Model.