HNSW-Sharp

Distance functions

SmallWorld<TItem, TDistance> is generic over both item type and distance type. You pass a Func<TItem, TItem, TDistance> to the constructor and the library calls it everywhere it needs to compare items.

var graph = new SmallWorld<float[], float>(
    distance: CosineDistance.SIMDForUnits,   // ← the distance function
    generator: DefaultRandomGenerator.Instance,
    parameters: parameters);

Built-in: cosine distance

HNSW-Sharp ships four CosineDistance variants in a single static class. They trade universality for performance:

Variant Precondition Throughput When to use
CosineDistance.NonOptimized None — works for any float[]. Lowest. Reference, correctness checks, or non-unit vectors.
CosineDistance.ForUnits Vectors must be unit-normalised. Moderate. Pre-normalised embeddings without SIMD overhead.
CosineDistance.SIMD None. High. General-purpose, on any modern .NET runtime.
CosineDistance.SIMDForUnits Vectors must be unit-normalised. Highest. The common case — pre-normalised embeddings on modern .NET.

If you don't know whether your vectors are unit-normalised, treat them as non-unit and use CosineDistance.SIMD. If you control the embedding pipeline, normalise once and use CosineDistance.SIMDForUnits — typically 1.5–3× faster.

Unit-vector functions are unchecked

CosineDistance.ForUnits and CosineDistance.SIMDForUnits do not verify the input. Passing non-unit vectors produces wrong distances silently. Normalise on the way in.

Bring your own distance

The constructor takes any Func<TItem, TItem, TDistance>. Two rules apply:

  1. Smaller distances must mean "closer." HNSW always treats lower values as better neighbours.
  2. The function should be metric-like — symmetric, non-negative, and distance(x, x) == 0. The algorithm doesn't strictly require the triangle inequality, but recall degrades for distances that violate it badly.

Example: Euclidean (L2)

static float L2(float[] u, float[] v)
{
    float acc = 0f;
    for (int i = 0; i < u.Length; i++)
    {
        float d = u[i] - v[i];
        acc += d * d;
    }
    return MathF.Sqrt(acc);
}

var graph = new SmallWorld<float[], float>(
    L2,
    DefaultRandomGenerator.Instance,
    parameters);

Example: Hamming distance on byte[]

static int Hamming(byte[] u, byte[] v)
{
    int acc = 0;
    for (int i = 0; i < u.Length; i++)
        acc += System.Numerics.BitOperations.PopCount((uint)(u[i] ^ v[i]));
    return acc;
}

var graph = new SmallWorld<byte[], int>(
    Hamming,
    DefaultRandomGenerator.Instance,
    parameters);

TDistance is int here — the type just needs to be a numeric struct that implements IComparable<TDistance>.

Example: mixed-feature distance with a weighted blend

record Doc(float[] Embedding, int Year);

static float Mixed(Doc a, Doc b)
{
    float vec = CosineDistance.SIMDForUnits(a.Embedding, b.Embedding);
    float year = MathF.Abs(a.Year - b.Year) / 100f;
    return 0.8f * vec + 0.2f * year;
}

var graph = new SmallWorld<Doc, float>(
    Mixed,
    DefaultRandomGenerator.Instance,
    parameters);

Notes on performance

  • HNSW evaluates the distance function a lot — easily millions of times during a single graph build. Anything you can hoist out of the function (allocation, branching, pointer indirection) pays back many times over.
  • If you can express the metric in System.Numerics.Vector<T>, do — the built-in SIMD cosine functions are a useful reference.
  • The library caches distance computations during build when EnableDistanceCacheForConstruction = true. If your distance function is already nearly free (e.g. a precomputed table lookup), disable the cache.

For more on tuning, see Performance.

© 2026 HNSW-Sharp. All rights reserved.