Distance functions
SmallWorld<TItem, TDistance> is generic over both item type and distance type. You pass a Func<TItem, TItem, TDistance> to the constructor and the library calls it everywhere it needs to compare items.
var graph = new SmallWorld<float[], float>(
distance: CosineDistance.SIMDForUnits, // ← the distance function
generator: DefaultRandomGenerator.Instance,
parameters: parameters);
Built-in: cosine distance
HNSW-Sharp ships four CosineDistance variants in a single static class. They trade universality for performance:
| Variant | Precondition | Throughput | When to use |
|---|---|---|---|
CosineDistance.NonOptimized |
None — works for any float[]. |
Lowest. | Reference, correctness checks, or non-unit vectors. |
CosineDistance.ForUnits |
Vectors must be unit-normalised. | Moderate. | Pre-normalised embeddings without SIMD overhead. |
CosineDistance.SIMD |
None. | High. | General-purpose, on any modern .NET runtime. |
CosineDistance.SIMDForUnits |
Vectors must be unit-normalised. | Highest. | The common case — pre-normalised embeddings on modern .NET. |
If you don't know whether your vectors are unit-normalised, treat them as non-unit and use CosineDistance.SIMD. If you control the embedding pipeline, normalise once and use CosineDistance.SIMDForUnits — typically 1.5–3× faster.
Unit-vector functions are unchecked
CosineDistance.ForUnits and CosineDistance.SIMDForUnits do not verify the input. Passing non-unit vectors produces wrong distances silently. Normalise on the way in.
Bring your own distance
The constructor takes any Func<TItem, TItem, TDistance>. Two rules apply:
- Smaller distances must mean "closer." HNSW always treats lower values as better neighbours.
- The function should be metric-like — symmetric, non-negative, and
distance(x, x) == 0. The algorithm doesn't strictly require the triangle inequality, but recall degrades for distances that violate it badly.
Example: Euclidean (L2)
static float L2(float[] u, float[] v)
{
float acc = 0f;
for (int i = 0; i < u.Length; i++)
{
float d = u[i] - v[i];
acc += d * d;
}
return MathF.Sqrt(acc);
}
var graph = new SmallWorld<float[], float>(
L2,
DefaultRandomGenerator.Instance,
parameters);
Example: Hamming distance on byte[]
static int Hamming(byte[] u, byte[] v)
{
int acc = 0;
for (int i = 0; i < u.Length; i++)
acc += System.Numerics.BitOperations.PopCount((uint)(u[i] ^ v[i]));
return acc;
}
var graph = new SmallWorld<byte[], int>(
Hamming,
DefaultRandomGenerator.Instance,
parameters);
TDistance is int here — the type just needs to be a numeric struct that implements IComparable<TDistance>.
Example: mixed-feature distance with a weighted blend
record Doc(float[] Embedding, int Year);
static float Mixed(Doc a, Doc b)
{
float vec = CosineDistance.SIMDForUnits(a.Embedding, b.Embedding);
float year = MathF.Abs(a.Year - b.Year) / 100f;
return 0.8f * vec + 0.2f * year;
}
var graph = new SmallWorld<Doc, float>(
Mixed,
DefaultRandomGenerator.Instance,
parameters);
Notes on performance
- HNSW evaluates the distance function a lot — easily millions of times during a single graph build. Anything you can hoist out of the function (allocation, branching, pointer indirection) pays back many times over.
- If you can express the metric in
System.Numerics.Vector<T>, do — the built-in SIMD cosine functions are a useful reference. - The library caches distance computations during build when
EnableDistanceCacheForConstruction = true. If your distance function is already nearly free (e.g. a precomputed table lookup), disable the cache.
For more on tuning, see Performance.