UMAP

Basic Usage

UMAP-Sharp exposes one class — UMAP.Umap — and three methods that you call in order: InitializeFit, Step, and GetEmbedding. The full lifecycle of a projection is captured in a single short program.

The three-step pipeline

using UMAP;

// 1. Build your input matrix as float[][].
//    Every row MUST be the same length.
float[][] vectors = LoadVectors();

// 2. Construct a Umap instance. Defaults give a 2-D cosine projection.
var umap = new Umap();

// 3. Initialise — returns the recommended number of optimization epochs.
var epochs = umap.InitializeFit(vectors);

// 4. Run the SGD loop.
for (var i = 0; i < epochs; i++)
{
    umap.Step();
}

// 5. Read the projected embedding (same order as the input rows).
float[][] embedding = umap.GetEmbedding();
Order is preserved

The vectors returned by GetEmbedding() are in the same order as the rows you passed to InitializeFit. If you have labels associated with the input vectors, you can zip them with the embedding directly:

var labelled = embedding.Zip(labels, (point, label) => (point, label));

Input shape

UMAP needs a rectangular float[][]:

float[][] vectors = new float[N][];
for (var i = 0; i < N; i++)
{
    vectors[i] = new float[D]; // every row has length D
}
  • N is the number of items you want to project.
  • D is the dimensionality of each input vector.
  • Every nested array must have the same length.

In practice the rows come from an embedding model — sentence embeddings, image embeddings, word2vec vectors, hidden-layer activations, and so on. UMAP does not care what the vectors mean, only that they live in a common metric space.

Output shape

GetEmbedding() returns a float[][] where each row has length equal to the dimensions constructor argument (default 2). For a 2-D projection of 10,000 input vectors:

var embedding = umap.GetEmbedding(); // float[10000][2]

Why two phases — InitializeFit and Step?

UMAP performs two distinct kinds of work:

  1. Fit — compute approximate nearest neighbors, build a fuzzy simplicial set, and initialise the low-dimensional embedding. Done once, inside InitializeFit.
  2. Optimize — run stochastic gradient descent over the embedding for a number of epochs. Each call to Step performs one epoch.

Splitting the loop out gives you control over progress reporting, cancellation, and partial visualisations. InitializeFit returns the recommended epoch count, but you can stop early, run extra epochs, or report progress between iterations.

var epochs = umap.InitializeFit(vectors);

for (var i = 0; i < epochs; i++)
{
    umap.Step();

    if (i % 25 == 0)
    {
        Console.WriteLine($"Epoch {i}/{epochs}");
    }

    if (cancellationToken.IsCancellationRequested)
    {
        break;
    }
}

Epoch counts

The number of epochs returned by InitializeFit is heuristic — it scales down as the dataset grows so total wall time stays sensible:

Input rows Default epochs
≤ 2,500 500
≤ 5,000 400
≤ 7,500 300
> 7,500 200

To override it, pass customNumberOfEpochs to the constructor. See Configuration for details.

A complete minimal program

using System;
using System.Linq;
using UMAP;

var rng = new Random(42);

// 200 random 50-D vectors
var vectors = Enumerable.Range(0, 200)
    .Select(_ => Enumerable.Range(0, 50)
        .Select(_ => (float)rng.NextDouble())
        .ToArray())
    .ToArray();

var umap = new Umap();
var epochs = umap.InitializeFit(vectors);

Console.WriteLine($"Running {epochs} epochs...");
for (var i = 0; i < epochs; i++)
{
    umap.Step();
}

var embedding = umap.GetEmbedding();

Console.WriteLine($"Projected {embedding.Length} vectors to {embedding[0].Length}D");
Console.WriteLine($"First point: ({embedding[0][0]:F3}, {embedding[0][1]:F3})");

Next

© 2026 UMAP. All rights reserved.