Curiosity Workspaces

# Entity Extraction

# Entity Extraction

Entity extraction identifies meaningful spans of text (entities) and turns them into structured outputs. Examples:

products and device names
people and organizations
identifiers (ticket IDs, asset tags)
locations and topics (depending on your domain)

In Curiosity Workspace, extraction is typically configured via NLP pipelines and models, then optionally linked into the graph.

# Extraction vs linking

Extraction finds entities in text.
Linking connects extracted entities to graph nodes (or creates nodes) so you can:
- navigate from text to entities
- filter by entities
- use entities as grounding context for AI workflows

# Common model types (conceptual)

Dictionary/spotter models
- match known terms (product catalog, customer names)
Pattern models
- capture structured forms (IDs, serial formats, codes)
ML models
- detect entities that cannot be enumerated (optional, domain-dependent)

# Recommended workflow

Start with extraction on a single high-value field (e.g., Summary).
Run experiments to evaluate:
- precision (how many captures are correct?)
- recall (how much is missed?)
Iterate on model coverage and exclusions.
Enable linking into the graph only when extraction is reliable enough.

# Common pitfalls

High false positives: pattern models can over-capture; add constraints and test broadly.
Ambiguous names: dictionary models need aliases and disambiguation strategy.
No evaluation loop: extraction needs iteration with real examples.

# Next steps

Add semantic retrieval to complement extraction: Embeddings
Implement domain-specific extraction rules: Custom NLP Rules

See also

Curiosity Workspace uses AI models in three common ways:

Custom NLP Rules

Most production NLP setups require domain tuning. “Custom NLP rules” refers to the mechanisms you use to make extraction and linking align with your

Entity Extraction and NLP Tuning

Tuning your Natural Language Processing (NLP) pipelines is essential for high-quality entity extraction and search relevance.

Natural Language Processing (NLP) in Curiosity Workspace turns raw text into structured signals you can search, filter, and connect to your graph.

Ranking tuning is the process of making “the right results appear first” for your users. In Curiosity Workspace, tuning typically happens through