Building a graph
SmallWorld<TItem, TDistance>.AddItems inserts one batch of items into the graph. Calling it again grows the same graph — there is no "freeze" step, no separate index/query mode.
Construct the graph
using HNSW.Net;
var graph = new SmallWorld<float[], float>(
distance: CosineDistance.SIMDForUnits,
generator: DefaultRandomGenerator.Instance,
parameters: new SmallWorldParameters
{
M = 16,
LevelLambda = 1 / Math.Log(16),
EfSearch = 64,
ConstructionPruning = 200,
InitialItemsSize = 1_000_000, // pre-size if you know the dataset
});
InitialItemsSize only affects allocation strategy — it doesn't pre-reserve graph nodes.
Add items
IReadOnlyList<int> ids = graph.AddItems(vectors);
AddItems returns the integer IDs HNSW assigns to each input item. These IDs are returned from KNNSearch and used everywhere the library refers to an item. They are stable for the lifetime of the graph.
You can call AddItems multiple times to grow the graph incrementally:
graph.AddItems(initialBatch); // ids 0..N-1
graph.AddItems(laterBatch); // ids N..N+M-1
The graph stays valid and queryable between calls.
Report progress
For large batches, pass an IProgressReporter:
class ConsoleProgress : IProgressReporter
{
public void Progress(int current, int total)
=> Console.WriteLine($" {current,8}/{total} ({100.0 * current / total:F1}%)");
}
graph.AddItems(vectors, new ConsoleProgress());
The reporter is called from the inserting thread, not pumped to a UI thread — marshal yourself if needed.
Pick the right batch size
HNSW-Sharp inserts items sequentially; large batches don't fan out across threads. From a memory perspective, however:
- Smaller batches — easier to bound peak memory and to interleave with
KNNSearchcalls. - Larger batches — slightly faster overall because of fewer write-lock acquisitions.
A batch of 10k–100k items is usually a good compromise.
Removing items
The HNSW algorithm doesn't support deletion of individual nodes. If you need to remove items, either:
- Soft-delete — keep the items in the graph and filter them out at query time with a
filterItempredicate. See Filtering. - Rebuild periodically — recreate the graph from the live set on a schedule appropriate to your churn rate.
For most workloads soft-delete + periodic rebuild is the right pattern.
Common pitfalls
Don't change vectors after insertion
HNSW-Sharp stores references to the vectors, not copies. Mutating a float[] after insertion silently corrupts the graph. Treat embeddings as immutable.
Pre-normalise for `SIMDForUnits`
If your distance function is CosineDistance.SIMDForUnits, normalise vectors before calling AddItems. The function doesn't check and silently returns wrong distances on non-unit input. See Distance functions.