NLP Pipelines
A pipeline is the per-language NLP stack the workspace runs over a field's text — tokenizer, POS tagger, spotter models, pattern spotters, entity linker, and any post-processing. You assign the pipeline to specific node-field pairs and the workspace processes them automatically at ingest and reparse time.
Create a pipeline
- Settings → Languages — enable the languages you'll process (English, French, etc.). One pipeline per language.
- Settings → NLP → Pipelines → + New pipeline.
- Pick the language and the mode:
| Mode | When to use |
|---|---|
Data Parsing |
Standard ingestion — extract entities, link to graph, store annotations. |
Conversational |
Chat-style content where speaker turns and intent matter. |
Custom |
Mix-and-match — disable the parts you don't need. |
Assign a pipeline to fields
Under the pipeline's Used for tab, pick the node types and fields to process:
SupportCase → SummarySupportCase → ContentKbArticle → Body
Every value of those fields — past and future — flows through the pipeline. New ingest is processed during the commit; backfill happens via Reparse this data.
Multilingual content
If you ingest data in multiple languages, create one pipeline per language and assign the same field to all of them. Each ingest is auto-routed by the language detector. For mixed-language fields (e.g. a forum thread with English and French posts), the workspace splits at sentence level and routes each sentence independently.
Inspecting what a pipeline produced
Annotations are stored as edges from the source node. From the shell:
// All entities mentioned in a specific case.
return Q().StartAt(Node.FromKey("SupportCase", "CS-0142"))
.Out("_Mentions")
.Emit("E");
Or, for a sanity-check across the corpus:
// How many cases mention at least one Device?
return Q().StartAt("SupportCase")
.Where(c => c.Out("_Mentions").OfType("Device").Count() > 0)
.EmitCount("C");
When to reparse
Triggering a reparse is necessary whenever the pipeline behavior changes:
- A new spotter or pattern is added.
- An existing model is updated.
- Linking rules change.
- A field is assigned to a different (or additional) pipeline.
Trigger from the node schema page: Settings → Schema → [type] → Reparse this data. The workspace queues every existing node of that type for reprocessing; track progress under Settings → Tasks.
On large corpora, reparsing is expensive — it walks every document and re-runs the full pipeline. Pause non-essential ingestion during a reparse so background queues don't compete.