#
Entity Extraction
#
Entity Extraction
Entity extraction identifies meaningful spans of text (entities) and turns them into structured outputs. Examples:
- products and device names
- people and organizations
- identifiers (ticket IDs, asset tags)
- locations and topics (depending on your domain)
In Curiosity Workspace, extraction is typically configured via NLP pipelines and models, then optionally linked into the graph.
#
Extraction vs linking
- Extraction finds entities in text.
- Linking connects extracted entities to graph nodes (or creates nodes) so you can:
- navigate from text to entities
- filter by entities
- use entities as grounding context for AI workflows
#
Common model types (conceptual)
- Dictionary/spotter models
- match known terms (product catalog, customer names)
- Pattern models
- capture structured forms (IDs, serial formats, codes)
- ML models
- detect entities that cannot be enumerated (optional, domain-dependent)
#
Recommended workflow
- Start with extraction on a single high-value field (e.g.,
Summary). - Run experiments to evaluate:
- precision (how many captures are correct?)
- recall (how much is missed?)
- Iterate on model coverage and exclusions.
- Enable linking into the graph only when extraction is reliable enough.
#
Common pitfalls
- High false positives: pattern models can over-capture; add constraints and test broadly.
- Ambiguous names: dictionary models need aliases and disambiguation strategy.
- No evaluation loop: extraction needs iteration with real examples.
#
Next steps
- Add semantic retrieval to complement extraction: Embeddings
- Implement domain-specific extraction rules: Custom NLP Rules
See also
Curiosity Workspace uses AI models in three common ways:
Most production NLP setups require domain tuning. “Custom NLP rules” refers to the mechanisms you use to make extraction and linking align with your...
Tuning your Natural Language Processing (NLP) pipelines is essential for high-quality entity extraction and search relevance.
Natural Language Processing (NLP) in Curiosity Workspace turns raw text into structured signals you can search, filter, and connect to your graph.
Ranking tuning is the process of making “the right results appear first” for your users.