#
Indexes in Curiosity
Indexes in Curiosity are responsible for processing nodes to enable search, querying, and advanced analytics. Unlike traditional databases where indexing might happen synchronously with writes, indexing in Curiosity is an asynchronous background job.
#
Indexing Process
When you commit a node (e.g., via Graph.CommitAsync or an ingestion pipeline), the transaction is completed immediately. The node is then placed into queues for various indexes. Background workers pick up these nodes and process them. This ensures high write throughput but means there is a slight "eventual consistency" delay before a new node appears in search results.
#
Index Types
Curiosity supports a wide range of indexes, visible in the Indexes view in the admin UI.
#
Full Text Search
These indexes enable text-based retrieval.
- Lucene Text Index: The standard full-text search index (based on Lucene.NET). It supports tokenization, stemming, and boolean queries.
- Fuzzy Full Text (Command Score): A fuzzy search index designed for auto-complete style text indexing.
#
Embeddings & Vector Search
All embedding indexes in Curiosity utilize HNSW (Hierarchical Navigable Small World) graphs for efficient approximate nearest neighbor search.
- Page Space Embeddings: Generates embeddings based on the graph structure (link analysis). Nodes that are connected or structurally similar will have similar vectors.
- Sentence Embeddings: Converts text content into vectors using Transformer models (e.g., MiniLM, ArcticXS). Useful for semantic search.
- Raw Embeddings: Allows you to supply your own pre-computed vectors (e.g., from an external API) and index them for similarity search.
#
NLP & Graph Construction
- Custom Code Index: Runs arbitrary C# code against nodes. Used for custom logic, validation, or data enrichment.
- Field To Document Index (Parser): Takes raw text from a node property, parses it (extracting entities, phrases), and creates a temporary
_Documentnode. - Document To Graph Index (Linker): Takes the
_Documentnode created by the Parser and "materializes" it into the graph. It creates edges between the parent node and the entities found (e.g.,Person,Location,Organization), effectively linking unstructured text to structured graph nodes.
#
Filtering Indexes
- Simple Text Index: A property value index used for filtering and facets.
- Numeric Index: Optimized for range queries on numbers.
- Time Index: Optimized for time-based queries.
- Geo Index: Optimized for spatial queries (radius, bounding box).
#
Manual Interaction
You can interact with the index manager programmatically using Graph.Internals.Indexes. This is useful if you need to force re-indexing of specific nodes or trigger custom workflows.
#
Methods
OfType<T>(): Selects a specific category of indexes (e.g.,OfType<LuceneTextIndex>()).OfType<T>(string nodeType): Selects indexes of a specific type that target a specific node type.Enqueue(Node node): Adds a node to the processing queue.Enqueue(UID128 uid): Adds a node UID to the processing queue.
#
Example: Reprocessing a Node
// 1. Get the specific index instance (e.g., Lucene Text Index for "Person")
var personIndex = Graph.Internals.Indexes.OfType<LuceneTextIndex>("Person").FirstOrDefault();
if (personIndex != null)
{
// Case A: Enqueue a single node
personIndex.Enqueue(myNode);
// Case B: Enqueue ALL "Person" nodes
foreach(var uid in Query().StartAt("Person").AsEnumerableUIDs())
{
personIndex.Enqueue(uid);
}
}