UMAP

Distance functions

UMAP needs a way to measure how "close" two input vectors are. UMAP-Sharp ships three built-in distance functions and lets you supply your own.

public delegate float DistanceCalculation(float[] x, float[] y);

Built-in functions live on Umap.DistanceFunctions:

Function When to use
Cosine (default) General-purpose. Works for most embeddings without preprocessing.
CosineForNormalizedVectors Faster equivalent when every input vector is already unit-normalised.
Euclidean When magnitude matters (raw image pixels, geometric coordinates, audio frames).
Custom Hamming, Manhattan, Jaccard, dot-product distance — anything float (float[], float[]) you want.

Cosine

Cosine distance treats vectors as directions in space — magnitude is ignored, only the angle matters. This is the right default for text embeddings, word vectors, and most learned representations.

var umap = new Umap(distance: Umap.DistanceFunctions.Cosine);

The implementation uses SIMD dot-product and magnitude kernels:

public static float Cosine(float[] lhs, float[] rhs)
{
    return 1 - (SIMD.DotProduct(ref lhs, ref rhs) / (SIMD.Magnitude(ref lhs) * SIMD.Magnitude(ref rhs)));
}

CosineForNormalizedVectors

If every input vector has unit length (i.e. |x| == 1), you can skip the magnitude calculation entirely. This is roughly twice as fast as Cosine for moderate-dimensional vectors.

var umap = new Umap(distance: Umap.DistanceFunctions.CosineForNormalizedVectors);
Vectors really must be normalised

This function does not check that vectors are normalised — it just assumes it. Feeding non-unit vectors will produce nonsense distances and a poor embedding. Normalise upstream:

static float[] Normalize(float[] v)
{
var mag = MathF.Sqrt(v.Sum(x => x * x));
if (mag == 0) return v;
return v.Select(x => x / mag).ToArray();
}

Euclidean

Straight-line distance in the original space. Use this when vector magnitudes carry meaning — for example, raw pixel intensities, signal amplitudes, or coordinates that you do not want to normalise away.

var umap = new Umap(distance: Umap.DistanceFunctions.Euclidean);

Custom distance functions

Any delegate matching the DistanceCalculation signature works. The function will be called many times during fit, so keep it fast — avoid LINQ in the hot path.

Hamming distance (binary vectors)

float Hamming(float[] x, float[] y)
{
    var different = 0;
    for (var i = 0; i < x.Length; i++)
    {
        if (x[i] != y[i]) different++;
    }
    return different;
}

var umap = new Umap(distance: Hamming);

Manhattan (L1) distance

float Manhattan(float[] x, float[] y)
{
    var sum = 0f;
    for (var i = 0; i < x.Length; i++)
    {
        sum += MathF.Abs(x[i] - y[i]);
    }
    return sum;
}

var umap = new Umap(distance: Manhattan);

Dot-product distance

For embedding models trained with a dot-product loss (e.g. some retrieval models), maximising similarity = minimising negative dot product:

float DotDistance(float[] x, float[] y)
{
    var dot = 0f;
    for (var i = 0; i < x.Length; i++)
    {
        dot += x[i] * y[i];
    }
    return -dot;
}

var umap = new Umap(distance: DotDistance);

Picking the right metric

A short decision tree:

flowchart TD A[Vectors from an embedding model] -->|already unit-normalised| B[CosineForNormalizedVectors] A -->|not normalised| C[Cosine] D[Raw geometric / signal data] --> E[Euclidean] F[Binary / categorical vectors] --> G[Custom: Hamming] H[Dot-product trained retrieval model] --> I[Custom: DotDistance]

When in doubt, start with Cosine. The biggest visual differences in your output usually come from numberOfNeighbors, not the metric.

Performance notes

  • All three built-ins use System.Numerics.Vector<float> for SIMD acceleration.
  • Custom delegates are called from inside the hot inner loop — write them with allocation-free, branchless code where possible.
  • The distance function is invoked only during InitializeFit (specifically inside the nearest-neighbor descent). It does not run during Step, which uses a reduced Euclidean kernel on the low-dimensional embedding.
© 2026 UMAP. All rights reserved.