# Sample Datasets

# Sample Datasets

Sample datasets are a practical way to evaluate Curiosity Workspace and to teach teams the core patterns:

  • ingest data into a graph schema
  • configure search and facets
  • enable embeddings and AI-assisted workflows
  • build endpoints and interfaces

# What makes a good sample dataset

A good sample dataset has:

  • Multiple entity types (at least 3–5 node types)
  • Relationships that support navigation and filtering (edges)
  • Text fields suitable for search (titles, summaries, descriptions)
  • A time dimension (timestamps) to test recency and time filters
  • Enough volume to see ranking and performance behavior

# Recommended sample dataset categories

  • Support and case management
    • tickets/cases, products, customers, teams, statuses
  • Compliance and audit
    • policies, controls, evidence, owners, exceptions
  • Engineering knowledge base
    • docs, code artifacts, services, incidents, runbooks
  • Research
    • papers, authors, topics, citations, institutions

# How to use sample datasets in your docs/testing

Use sample datasets to validate:

  • Graph navigation: do the relationships enable the workflows you want?
  • Search relevance: can users find the right objects by keywords?
  • Semantic recall: can vector search find “similar meaning” content?
  • Facet usefulness: do the chosen facets match how users refine results?

# Next steps