Distance functions
UMAP needs a way to measure how "close" two input vectors are. UMAP-Sharp ships three built-in distance functions and lets you supply your own.
public delegate float DistanceCalculation(float[] x, float[] y);
Built-in functions live on Umap.DistanceFunctions:
| Function | When to use |
|---|---|
Cosine (default) |
General-purpose. Works for most embeddings without preprocessing. |
CosineForNormalizedVectors |
Faster equivalent when every input vector is already unit-normalised. |
Euclidean |
When magnitude matters (raw image pixels, geometric coordinates, audio frames). |
| Custom | Hamming, Manhattan, Jaccard, dot-product distance — anything float (float[], float[]) you want. |
Cosine
Cosine distance treats vectors as directions in space — magnitude is ignored, only the angle matters. This is the right default for text embeddings, word vectors, and most learned representations.
var umap = new Umap(distance: Umap.DistanceFunctions.Cosine);
The implementation uses SIMD dot-product and magnitude kernels:
public static float Cosine(float[] lhs, float[] rhs)
{
return 1 - (SIMD.DotProduct(ref lhs, ref rhs) / (SIMD.Magnitude(ref lhs) * SIMD.Magnitude(ref rhs)));
}
CosineForNormalizedVectors
If every input vector has unit length (i.e. |x| == 1), you can skip the magnitude calculation entirely. This is roughly twice as fast as Cosine for moderate-dimensional vectors.
var umap = new Umap(distance: Umap.DistanceFunctions.CosineForNormalizedVectors);
Vectors really must be normalised
This function does not check that vectors are normalised — it just assumes it. Feeding non-unit vectors will produce nonsense distances and a poor embedding. Normalise upstream:
static float[] Normalize(float[] v)
{
var mag = MathF.Sqrt(v.Sum(x => x * x));
if (mag == 0) return v;
return v.Select(x => x / mag).ToArray();
}
Euclidean
Straight-line distance in the original space. Use this when vector magnitudes carry meaning — for example, raw pixel intensities, signal amplitudes, or coordinates that you do not want to normalise away.
var umap = new Umap(distance: Umap.DistanceFunctions.Euclidean);
Custom distance functions
Any delegate matching the DistanceCalculation signature works. The function will be called many times during fit, so keep it fast — avoid LINQ in the hot path.
Hamming distance (binary vectors)
float Hamming(float[] x, float[] y)
{
var different = 0;
for (var i = 0; i < x.Length; i++)
{
if (x[i] != y[i]) different++;
}
return different;
}
var umap = new Umap(distance: Hamming);
Manhattan (L1) distance
float Manhattan(float[] x, float[] y)
{
var sum = 0f;
for (var i = 0; i < x.Length; i++)
{
sum += MathF.Abs(x[i] - y[i]);
}
return sum;
}
var umap = new Umap(distance: Manhattan);
Dot-product distance
For embedding models trained with a dot-product loss (e.g. some retrieval models), maximising similarity = minimising negative dot product:
float DotDistance(float[] x, float[] y)
{
var dot = 0f;
for (var i = 0; i < x.Length; i++)
{
dot += x[i] * y[i];
}
return -dot;
}
var umap = new Umap(distance: DotDistance);
Picking the right metric
A short decision tree:
When in doubt, start with Cosine. The biggest visual differences in your output usually come from numberOfNeighbors, not the metric.
Performance notes
- All three built-ins use
System.Numerics.Vector<float>for SIMD acceleration. - Custom delegates are called from inside the hot inner loop — write them with allocation-free, branchless code where possible.
- The distance function is invoked only during
InitializeFit(specifically inside the nearest-neighbor descent). It does not run duringStep, which uses a reduced Euclidean kernel on the low-dimensional embedding.