#
Embeddings
#
Embeddings
Embeddings are vector representations of content that enable semantic similarity and vector search. In Curiosity Workspace, embeddings are commonly used to:
- power vector search (semantic retrieval)
- find similar items (related cases/documents)
- provide candidate context for LLM workflows
#
How embeddings are used
Typical workflow:
- Choose which fields should be embedded (usually longer, descriptive text).
- Configure embedding index creation for those fields.
- Run similarity queries to retrieve nearest neighbors for a user query or a node’s content.
#
Choosing what to embed
Good candidates:
- description/body fields
- conversations and transcripts
- summaries (if they contain meaningful signal)
Poor candidates:
- IDs and codes
- short labels (often better handled by keyword search)
#
Chunking (important for long text)
If a field can be very long:
- enable chunking
- ensure chunks align with semantic units (paragraphs, messages)
Chunking typically improves recall but may require careful tuning so results remain interpretable.
#
Evaluation
To validate embeddings:
- prepare a list of “similarity questions” (e.g., “find similar incidents”)
- confirm results are relevant and diverse
- tune cutoffs (how similar is “similar enough”?)
#
Next steps
- Use embeddings for retrieval: Search → Vector Search
- Combine with keyword retrieval: Search → Hybrid Search