Property Graph Model
Curiosity Workspace stores data as a labeled property graph: a directed, typed multigraph in which both entities (nodes) and the relationships between them carry structured information. This page is the formal reference for the model — what is allowed, what is not, and what the engine guarantees.
For practical schema-design advice, see Graph Model and Schema Design. This page describes the model itself.
Assumptions and invariants
The graph engine enforces a small, deliberate set of rules. Everything else builds on top of them.
- Single type per node. A node has exactly one type, fixed at creation. There is no multi-label set.
- Single type per edge. Each edge has exactly one type, fixed at creation.
- Edges are directed. Every edge has a source node and a target node; the two endpoints are not interchangeable. Bidirectional semantics are expressed by linking with two reciprocal edges (see Reciprocal edges).
- Edges have no user-defined properties. Properties live on nodes. To attach data to a relationship, model it as an intermediate node.
- Edges always have both endpoints. An edge cannot exist without an existing target node; orphaned edges are not representable.
- Each node has one stable identity. A node's identifier is a 128-bit value (
UID128) that is fixed for the lifetime of the node. - Each schema has one key field. The key, together with the node type, deterministically derives the node's identifier for externally-defined schemas.
- Property keys are unique within a node. A node cannot carry two properties with the same name.
- Timestamps are immutable. Nodes and edges record a creation timestamp that is set once and not modified.
- Reads are lock-free; writes are ACID. A commit either applies in full or not at all. Two writers committing the same logical entity concurrently get one winner; the other receives a
commit_abortederror and can retry.
Multigraph
Multiple edges of the same type between the same pair of nodes are allowed by default. Uniqueness is an application-level decision (see Edge uniqueness).
Core elements
A property graph in Curiosity is made of three things:
| Element | What it is | What it carries |
|---|---|---|
| Node | An entity in the domain (Customer, Ticket, Document). |
A type, a stable identifier, a creation timestamp, and a set of properties. |
| Edge | A directed, typed link from one node to another. | A type, the target node's identifier, and a creation timestamp. No user-defined properties. |
| Property | A typed field on a node. | A name and a value of one of the supported scalar, list, or dictionary types. |
A path is an alternating sequence of nodes and edges that follows edge direction; its length is the number of edges traversed.
Internal vs. external schemas
Curiosity ships with a large set of built-in node and edge types that power core functionality (users, files, access control, NLP enrichment, notifications, and so on). Every workspace also defines its own application-specific types on top. The distinction matters operationally.
Internal schemas
Internal schemas are reserved for the platform itself. They follow a strict naming rule and a stricter lifecycle:
- Names always start with an underscore (
_) — for example,_User,_FileEntry,_AccessGroup,_Document,_Blob,_Message. This is enforced by the engine: an internal type whose name does not begin with_is rejected at registration time. - They are protected from modification or deletion through the public API. The platform owns their structure; upgrades may add fields, but applications must not redefine them.
- Their identifiers are server-assigned, not derived from a domain key. The deterministic-hash identity rule that applies to external schemas does not apply to internal ones.
- Their display name may differ from their schema name. For example, the
_AccessGroupnode type is presented to end users as Team in the built-in UI.
Internal node types are organized by domain (access, files, NLP, notes, issues, calendar, CRM, messaging, automation, search, usage, and system) and there are ~100 of them in the current platform. Internal edge types follow the same _ convention and number ~150.
External (application-defined) schemas
External schemas are the ones you define for your domain — Customer, Ticket, Product, Policy, Document, and so on.
- Names must not start with an underscore. The leading underscore is reserved for the platform.
- You define them in code (for connectors) or register them dynamically. Each schema declares a name, a key field, and a set of typed fields.
- Their identifiers are deterministic. The platform derives the node's
UID128from a 128-bit hash combining the schema's type identifier and the node's key value (see Identity and keys). This is what makes ingestion idempotent: re-running a connector against the same source produces the same node, not a duplicate. - They can evolve. New properties can be added; renames and removals require migration. See Schema Design → Schema evolution.
When to use which
Application logic should add nodes of external types and link them — to each other and, where appropriate, to internal types (for example, attaching a _FileEntry to a domain entity, or restricting a node's visibility to an _AccessGroup). Application logic should not create or modify internal-type nodes directly except through their published APIs.
Nodes
A node represents a single entity. It has:
- A type — a non-empty string registered in the schema (
Customer,_FileEntry, ...). The type is fixed for the lifetime of the node. - An identifier — a
UID128, the node's permanent identity. Used by every edge that points to the node. - A creation timestamp — milliseconds since the epoch, set at creation, never updated.
- Properties — zero or more typed fields (see Properties).
- Outgoing edges — zero or more directed edges to other nodes.
Two nodes are equal if and only if they have the same identifier. Nodes are addressed by identifier in every internal API; lookup by (type, key) resolves to an identifier and then proceeds.
Edges
An edge is a directed link from one node to another. It has:
- A type — a non-empty string registered in the schema. Application edge types use natural names (
HasTicket,ForProduct,MentionsEntity); internal edge types follow the_prefix rule (_OwnedBy,_MemberOf,_HasMember). - A target node — referenced by
UID128. The source is implicit: the node that owns the edge. - A creation timestamp — set at creation, immutable.
Edges do not carry user-defined properties. This is a deliberate constraint that keeps traversals fast and edge storage compact. When you need to attach data to a relationship — a comment text, a score, a date range — model the relationship as a node and connect both endpoints to it.
Reciprocal edges
Edges are directed, but most relationships are useful to traverse from either side. The standard pattern is to create two edges, one in each direction, named to read naturally:
Customer --HasTicket--> Ticket --TicketOf--> Customer
Both edges are first-class. The engine does not infer one from the other — your connector creates both — but it provides a single linking call that emits the pair atomically.
Edge uniqueness
By default, multiple edges of the same type between the same pair of nodes are allowed. Two common patterns:
- Multi-edge (default): repeated occurrences are kept. Useful when each occurrence carries meaning, for example one edge per timestamped event modeled as a relationship node.
- Unique edge: enforced at write time via the explicit "add unique" / "remove unique" path. Useful for membership-style relationships ("user is in team") where the relationship is set-valued rather than event-valued.
Properties
Properties are typed fields stored on nodes. They are addressed by name within a node, and a node cannot carry two properties with the same name.
Property value types
Curiosity supports a fixed set of property value types:
| Family | Types |
|---|---|
| Strings | string |
| Booleans | bool |
| Integers | byte, sbyte, short, ushort, int, uint, long, ulong |
| Floating point | float, double, decimal |
| Characters | char |
| Identifiers | UID128 |
| Temporal | Time (timestamp) |
| Geographic | GeoPoint |
| Linguistic | Language |
| Collections | List<T>, Dictionary<K, V>, T[] of any supported scalar |
A few notes on the type system:
- No nested objects. A property value is a scalar, a list of scalars, or a key/value dictionary of scalars. Complex shapes are modeled as their own nodes connected by edges.
- Vectors are a distinct concept. Embedding vectors are stored as
float[]on embedding-capable node types and used by the similarity indexes, but they are not exposed as ordinary properties for filtering or sorting. - Binary content is stored separately. Large blobs (file contents, images, attachments) are persisted in blob storage and referenced from nodes by a
UID128(typically pointing to a_Blobnode).
Declared vs. dynamic fields
Properties are declared in the node schema. Each declared field has a name and a value type. The schema is the source of truth for:
- which fields exist on a node type,
- the type of each field,
- which field is the key (see below).
Optionality and indexing are layered on top: search indexing, encrypted-at-rest properties, and access control are configured separately from the schema definition. See Search Model and Access Control Model.
Identity and keys
Every node has a 128-bit identifier (UID128). For external (application-defined) schemas, the identifier is derived deterministically from the node type and the value of the key field:
UID128 = hash128(typeName) ⊕ hash128(keyValue)
This has two important consequences:
- Ingestion is idempotent. Re-running a connector with the same
(type, key)resolves to the same node. New properties update in place; the identifier does not change. - External systems can compute the identifier offline. Given the type and key, any client can derive the
UID128without round-tripping to the server.
Rules and recommendations:
- Each external schema must declare exactly one key field. Composite keys are not supported directly — pick one identity and model the rest as properties or edges.
- Key values must be non-empty strings. Empty or null keys are rejected.
- Pick a stable key. Source-system IDs are usually the best choice. Deterministic hashes of canonical records work, but any change to the canonicalization will recreate the affected nodes. See Schema Design → Keys.
- Internal nodes do not follow this rule. Their identifiers are assigned by the server; computing them offline is not supported.
Labels and types
Some property graph models distinguish "labels" (a multi-valued set on nodes) from "types" (a single value on relationships). Curiosity uses a simpler model:
- A node has one type. It is fixed at creation.
- An edge has one type. It is fixed at creation.
Multi-label semantics — "this node is also a Premium and an EU-Customer" — are modeled by linking the node to category nodes via labeled edges:
Customer --HasLabel--> Label{Premium}
Customer --HasLabel--> Label{EU-Customer}
This shape gives you free faceting ("show all customers labeled Premium") and lets each label carry its own metadata (description, owner, lifecycle), which a tag list on the node would not.
Access control as graph edges
Permissions in Curiosity are themselves expressed in the graph. There is no separate "ACL property" you put on a node. Instead:
- A user is a
_Usernode. - A team is an
_AccessGroupnode (presented in the UI as Team). - Membership is an edge:
_User --_MemberOf--> _AccessGroup(with the reciprocal_HasMember). - Ownership and visibility are edges:
_OwnedBy,_Owns,_HasAdmin,_AdminOf. - Node-type-level and field-level restrictions reference an
_AccessGroup: at query time, a user is allowed to read a restricted type or field only if they are a member of the configured group.
Because access control lives in the graph, the same traversal primitives that answer "what tickets belong to this customer" also answer "what teams can read this document". For the full data flow, see Access Control Model.
Paths and traversals
A path is an alternating sequence of nodes and edges that follows edge direction. Traversals are the primary way applications read the graph:
- start from a known node or a node type,
- step out of a node along one or more edge types (or in, to follow incoming edges),
- filter by property, type, or timestamp,
- return nodes, paths, neighbors, or aggregates.
Traversals are lock-free with respect to writes, so concurrent ingestion does not block reads. Search facets that filter by graph relationship (for example, "tickets whose customer is Acme") compile into traversal predicates evaluated alongside text and vector scoring.
What is intentionally not modeled
Knowing what the model does not support is as important as knowing what it does.
- Properties on edges. Use a relationship node when you need them.
- Multiple types per node. Use edges to category nodes for set-valued categorization.
- Composite keys. Choose one identity per schema; model alternates as properties or edges.
- Nested objects in properties. Promote the nested object to its own node type when the structure matters.
- Undirected edges. Use reciprocal edges when both directions are meaningful.
- Cross-workspace edges. A graph is the unit of consistency; edges always connect nodes within the same workspace.
Limits
| Aspect | Limit |
|---|---|
| Distinct node types | 16,777,215 (24-bit type identifier) |
| Distinct edge types | 16,777,215 (24-bit type identifier) |
| Node identifier width | 128 bits |
| Edge size on disk | 32 bytes (packed) |
| Properties per node | bounded by the schema definition |
| Multiplicity of edges between two nodes | unbounded (multigraph) |
Next steps
- Apply the model to a concrete domain: Schema Design.
- See how search and ranking compose on top of the graph: Search Model.
- Understand how data moves through the system end-to-end: Data Flow.
- Read the permission story in depth: Access Control Model.