#
Entity Extraction
#
Entity Extraction
Entity extraction identifies meaningful spans of text (entities) and turns them into structured outputs. Examples:
- products and device names
- people and organizations
- identifiers (ticket IDs, asset tags)
- locations and topics (depending on your domain)
In Curiosity Workspace, extraction is typically configured via NLP pipelines and models, then optionally linked into the graph.
#
Extraction vs linking
- Extraction finds entities in text.
- Linking connects extracted entities to graph nodes (or creates nodes) so you can:
- navigate from text to entities
- filter by entities
- use entities as grounding context for AI workflows
#
Common model types (conceptual)
- Dictionary/spotter models
- match known terms (product catalog, customer names)
- Pattern models
- capture structured forms (IDs, serial formats, codes)
- ML models
- detect entities that cannot be enumerated (optional, domain-dependent)
#
Recommended workflow
- Start with extraction on a single high-value field (e.g.,
Summary). - Run experiments to evaluate:
- precision (how many captures are correct?)
- recall (how much is missed?)
- Iterate on model coverage and exclusions.
- Enable linking into the graph only when extraction is reliable enough.
#
Common pitfalls
- High false positives: pattern models can over-capture; add constraints and test broadly.
- Ambiguous names: dictionary models need aliases and disambiguation strategy.
- No evaluation loop: extraction needs iteration with real examples.
#
Next steps
- Add semantic retrieval to complement extraction: Embeddings
- Implement domain-specific extraction rules: Custom NLP Rules