Curiosity

NLP Pipelines

A pipeline is the per-language NLP stack the workspace runs over a field's text — tokenizer, POS tagger, spotter models, pattern spotters, entity linker, and any post-processing. You assign the pipeline to specific node-field pairs and the workspace processes them automatically at ingest and reparse time.

Create a pipeline

  1. Settings → Languages — enable the languages you'll process (English, French, etc.). One pipeline per language.
  2. Settings → NLP → Pipelines → + New pipeline.
  3. Pick the language and the mode:
Mode When to use
Data Parsing Standard ingestion — extract entities, link to graph, store annotations.
Conversational Chat-style content where speaker turns and intent matter.
Custom Mix-and-match — disable the parts you don't need.

Assign a pipeline to fields

Under the pipeline's Used for tab, pick the node types and fields to process:

  • SupportCase → Summary
  • SupportCase → Content
  • KbArticle → Body

Every value of those fields — past and future — flows through the pipeline. New ingest is processed during the commit; backfill happens via Reparse this data.

Multilingual content

If you ingest data in multiple languages, create one pipeline per language and assign the same field to all of them. Each ingest is auto-routed by the language detector. For mixed-language fields (e.g. a forum thread with English and French posts), the workspace splits at sentence level and routes each sentence independently.

Inspecting what a pipeline produced

Annotations are stored as edges from the source node. From the shell:

// All entities mentioned in a specific case.
return Q().StartAt(Node.FromKey("SupportCase", "CS-0142"))
          .Out("_Mentions")
          .Emit("E");

Or, for a sanity-check across the corpus:

// How many cases mention at least one Device?
return Q().StartAt("SupportCase")
          .Where(c => c.Out("_Mentions").OfType("Device").Count() > 0)
          .EmitCount("C");

When to reparse

Triggering a reparse is necessary whenever the pipeline behavior changes:

  • A new spotter or pattern is added.
  • An existing model is updated.
  • Linking rules change.
  • A field is assigned to a different (or additional) pipeline.

Trigger from the node schema page: Settings → Schema → [type] → Reparse this data. The workspace queues every existing node of that type for reprocessing; track progress under Settings → Tasks.

On large corpora, reparsing is expensive — it walks every document and re-runs the full pipeline. Pause non-essential ingestion during a reparse so background queues don't compete.

© 2026 Curiosity. All rights reserved.