#
🚀 Catalyst: High-Performance NLP for .NET
Catalyst is a modern, pure C# Natural Language Processing (NLP) library built for extreme speed and efficiency. Inspired by the design philosophy of spaCy, Catalyst brings production-grade NLP capabilities—including pre-trained models, word embeddings, and entity recognition—to the .NET ecosystem.
Get Started • Explore Models • View Samples • API Reference
#
⚡ Why Catalyst?
- Built for Speed: Process over 1,000,000 tokens/s on a modern CPU.
- Pure C#: No Python dependencies or heavy wrappers. Native, modern .NET support.
- Cross-Platform: Runs seamlessly on Windows, Linux, macOS, and ARM architectures.
- Non-Destructive: Tokenization preserves the original text, allowing for perfect mapping between processed tokens and raw input.
- spaCy-Inspired Pipeline: A familiar "Pipeline" architecture for tokenization, lemmatization, POS tagging, and NER.
#
🛠Core Features
#
🏁 Getting Started
#
1. Installation
Install the core library and the language package for your target language via NuGet:
dotnet add package Catalyst
dotnet add package Catalyst.Models.English
#
2. Basic Usage
Catalyst makes it easy to process text in just a few lines of code.
using Catalyst;
using Catalyst.Models;
using Mosaik.Core;
// 1. Register the language and set storage location
Catalyst.Models.English.Register();
Storage.Current = new DiskStorage("catalyst-models");
// 2. Load the NLP pipeline
var nlp = await Pipeline.ForAsync(Language.English);
// 3. Process your document
var doc = new Document("The quick brown fox jumps over the lazy dog", Language.English);
nlp.ProcessSingle(doc);
// 4. Access results (Tokens, POS Tags, Lemmatization)
foreach(var span in doc)
{
foreach(var token in span)
{
Console.WriteLine($"{token.Value} [{token.POS}] -> {token.Lemma}");
}
}
#
🌍 Language Support
Catalyst provides pre-trained models for a wide variety of languages through the Universal Dependencies project. All language data is distributed as modular NuGet packages, ensuring your application only carries the weight it needs.
- Available Packages: English, French, German, Spanish, Italian, and more.
#
🧠 Advanced Capabilities
#
Multi-threaded Processing
Leverage .NET's native multi-threading to process large collections of documents efficiently:
var docs = GetLargeDocumentCollection();
var processedDocs = nlp.Process(docs); // Internally parallelized & lazy-evaluated
#
Pattern Matching (Entity Spotting)
Create complex rule-based entity recognizers using the PatternSpotter:
var spotter = new PatternSpotter(Language.English, 0, tag: "tech-stack", captureTag: "Tech");
spotter.NewPattern("C#", mp => mp.Add(new PatternUnit(P.Single().WithToken("C#"))));
nlp.Add(spotter);
#
📖 Learn More
- Tutorials: Deep dives into NER, Embeddings, and Training.
- Contributing: We welcome PRs! Help us make .NET NLP even faster.
Maintained by Curiosity. Licensed under MIT.