# Embeddings

# Embeddings

Embeddings are vector representations of content that enable semantic similarity and vector search. In Curiosity Workspace, embeddings are commonly used to:

  • power vector search (semantic retrieval)
  • find similar items (related cases/documents)
  • provide candidate context for LLM workflows

# How embeddings are used

Typical workflow:

  1. Choose which fields should be embedded (usually longer, descriptive text).
  2. Configure embedding index creation for those fields.
  3. Run similarity queries to retrieve nearest neighbors for a user query or a node’s content.

# Choosing what to embed

Good candidates:

  • description/body fields
  • conversations and transcripts
  • summaries (if they contain meaningful signal)

Poor candidates:

  • IDs and codes
  • short labels (often better handled by keyword search)

# Chunking (important for long text)

If a field can be very long:

  • enable chunking
  • ensure chunks align with semantic units (paragraphs, messages)

Chunking typically improves recall but may require careful tuning so results remain interpretable.

# Evaluation

To validate embeddings:

  • prepare a list of “similarity questions” (e.g., “find similar incidents”)
  • confirm results are relevant and diverse
  • tune cutoffs (how similar is “similar enough”?)

# Next steps