Basic Usage
UMAP-Sharp exposes one class — UMAP.Umap — and three methods that you call in order: InitializeFit, Step, and GetEmbedding. The full lifecycle of a projection is captured in a single short program.
The three-step pipeline
using UMAP;
// 1. Build your input matrix as float[][].
// Every row MUST be the same length.
float[][] vectors = LoadVectors();
// 2. Construct a Umap instance. Defaults give a 2-D cosine projection.
var umap = new Umap();
// 3. Initialise — returns the recommended number of optimization epochs.
var epochs = umap.InitializeFit(vectors);
// 4. Run the SGD loop.
for (var i = 0; i < epochs; i++)
{
umap.Step();
}
// 5. Read the projected embedding (same order as the input rows).
float[][] embedding = umap.GetEmbedding();
Order is preserved
The vectors returned by GetEmbedding() are in the same order as the rows you passed to InitializeFit. If you have labels associated with the input vectors, you can zip them with the embedding directly:
var labelled = embedding.Zip(labels, (point, label) => (point, label));
Input shape
UMAP needs a rectangular float[][]:
float[][] vectors = new float[N][];
for (var i = 0; i < N; i++)
{
vectors[i] = new float[D]; // every row has length D
}
Nis the number of items you want to project.Dis the dimensionality of each input vector.- Every nested array must have the same length.
In practice the rows come from an embedding model — sentence embeddings, image embeddings, word2vec vectors, hidden-layer activations, and so on. UMAP does not care what the vectors mean, only that they live in a common metric space.
Output shape
GetEmbedding() returns a float[][] where each row has length equal to the dimensions constructor argument (default 2). For a 2-D projection of 10,000 input vectors:
var embedding = umap.GetEmbedding(); // float[10000][2]
Why two phases — InitializeFit and Step?
UMAP performs two distinct kinds of work:
- Fit — compute approximate nearest neighbors, build a fuzzy simplicial set, and initialise the low-dimensional embedding. Done once, inside
InitializeFit. - Optimize — run stochastic gradient descent over the embedding for a number of epochs. Each call to
Stepperforms one epoch.
Splitting the loop out gives you control over progress reporting, cancellation, and partial visualisations. InitializeFit returns the recommended epoch count, but you can stop early, run extra epochs, or report progress between iterations.
var epochs = umap.InitializeFit(vectors);
for (var i = 0; i < epochs; i++)
{
umap.Step();
if (i % 25 == 0)
{
Console.WriteLine($"Epoch {i}/{epochs}");
}
if (cancellationToken.IsCancellationRequested)
{
break;
}
}
Epoch counts
The number of epochs returned by InitializeFit is heuristic — it scales down as the dataset grows so total wall time stays sensible:
| Input rows | Default epochs |
|---|---|
≤ 2,500 |
500 |
≤ 5,000 |
400 |
≤ 7,500 |
300 |
> 7,500 |
200 |
To override it, pass customNumberOfEpochs to the constructor. See Configuration for details.
A complete minimal program
using System;
using System.Linq;
using UMAP;
var rng = new Random(42);
// 200 random 50-D vectors
var vectors = Enumerable.Range(0, 200)
.Select(_ => Enumerable.Range(0, 50)
.Select(_ => (float)rng.NextDouble())
.ToArray())
.ToArray();
var umap = new Umap();
var epochs = umap.InitializeFit(vectors);
Console.WriteLine($"Running {epochs} epochs...");
for (var i = 0; i < epochs; i++)
{
umap.Step();
}
var embedding = umap.GetEmbedding();
Console.WriteLine($"Projected {embedding.Length} vectors to {embedding[0].Length}D");
Console.WriteLine($"First point: ({embedding[0][0]:F3}, {embedding[0][1]:F3})");
Next
- Configuration — change dimensions, neighbors, epochs, and randomness.
- Distance Functions — pick the right similarity metric for your vectors.
- 3-D Projections — render embeddings in three dimensions.
- MNIST Example — a full end-to-end example with visualisation.