Advanced Extraction

Beyond basic spotters and patterns, Curiosity's NLP configuration supports several additional capabilities.

Entity Capture Experiments

Before deploying a new model, use the Experiment and Capture interface to validate what the model would extract from your real data. This lets you iterate on patterns and spotter settings without writing anything to the graph.

Navigate to Management > Data, select a node type, and click Experiment and Capture. You can run experiments on pattern spotters and review what entities would be captured before committing to a final configuration.

Minimum Size Filtering

When creating spotter models from a node type, you can set a minimum character size for matched entities. This avoids capturing very short abbreviations or noise that would pollute the graph with low-quality links.

Entity Linking Options

For pattern-based spotters, you can enable If missing node, Create new node. With this option, when a pattern matches text that doesn't correspond to an existing node, a new node is automatically created and linked. This is useful for extracting identifiers (like serial numbers or ticket IDs) that may not be pre-loaded into the graph.

Reparsing Existing Data

After updating your NLP configuration — adding a model to a pipeline, changing entity linking rules — you need to trigger a reparse of existing nodes to apply the new settings retroactively. Navigate to the node schema page under Management > Data and select Reparse this data to queue all nodes of that type for reprocessing.

Multi-language Pipelines

Each pipeline is associated with a language. If your data contains content in multiple languages, create separate pipelines for each language and configure the relevant node fields under Used for in each pipeline. The workspace supports 21 languages including English, French, German, Spanish, Italian, Portuguese, Dutch, and others.