UMAP

Configuration

The Umap constructor takes six optional arguments. The defaults produce a 2-D cosine projection with thread-safe randomness — enough for most workloads — but every knob is there when you need it.

public Umap(
    DistanceCalculation distance = null,
    IProvideRandomValues random = null,
    int dimensions = 2,
    int numberOfNeighbors = 15,
    int? customNumberOfEpochs = null,
    ProgressReporter progressReporter = null)

Constructor arguments

Argument Type Default Description
distance DistanceCalculation Umap.DistanceFunctions.Cosine Metric used to compute distances between input vectors. See Distance Functions.
random IProvideRandomValues DefaultRandomGenerator.Instance Random number source. Controls reproducibility and whether the SGD step runs in parallel. See Parallelization and Reproducibility.
dimensions int 2 Number of output dimensions. Typically 2 or 3. See 3-D Projections.
numberOfNeighbors int 15 Local neighborhood size used to build the fuzzy simplicial set. Smaller values preserve local structure; larger values preserve more global structure.
customNumberOfEpochs int? null Override the recommended number of SGD epochs. Must be positive when set.
progressReporter Action<float> (delegate) null Optional callback that receives a 0.0 → 1.0 progress value during InitializeFit and Step. See Progress Reporting.
Defaults summary

Calling new Umap() with no arguments is equivalent to:

var umap = new Umap(
distance: Umap.DistanceFunctions.Cosine,
random: DefaultRandomGenerator.Instance,
dimensions: 2,
numberOfNeighbors: 15,
customNumberOfEpochs: null,
progressReporter: null);

dimensions

dimensions is the size of every row in the returned embedding. UMAP is mostly used with 2 (for scatter plots) or 3 (for interactive 3-D views), but higher values are valid if you want to feed the result into downstream models. The optimisation cost scales with this value.

// 3-D projection
var umap = new Umap(dimensions: 3);

numberOfNeighbors

numberOfNeighbors (often written n_neighbors in the UMAP literature) controls the balance between local and global structure:

  • Small values (5 – 15) — preserve fine local structure, separate small clusters cleanly, can fragment large ones.
  • Larger values (30 – 100) — preserve global structure, produce smoother overall shapes, may bleed nearby clusters together.

The default 15 is a good general choice. Increase it when the projection looks shattered; decrease it when distinct categories blur together.

var umap = new Umap(numberOfNeighbors: 30);

customNumberOfEpochs

By default InitializeFit returns a heuristic epoch count based on dataset size (see Basic Usage). You can override this with customNumberOfEpochs:

var umap = new Umap(customNumberOfEpochs: 1000);
var epochs = umap.InitializeFit(vectors); // == 1000

More epochs ≈ a more refined embedding at the cost of wall time. There is steep diminishing returns past the default; doubling the count rarely changes the visualisation noticeably.

Must be positive

Passing 0 or a negative value throws ArgumentOutOfRangeException.

distance

Pick a metric that matches how your input vectors were produced:

  • Cosine (default) — general-purpose, good for most text/embedding workloads.
  • CosineForNormalizedVectors — faster equivalent when your vectors are already unit-normalised.
  • Euclidean — use when magnitude matters (raw pixel values, geometric coordinates).
  • Custom delegate — supply your own float (float[], float[]) function.

See the dedicated Distance Functions page for examples and benchmarks.

random

random controls two things at once:

  1. Reproducibility — whether two runs with the same input produce the same output.
  2. Parallelism — UMAP only multi-threads the SGD step if random.IsThreadSafe is true.
Generator Thread-safe Reproducible
DefaultRandomGenerator.Instance (default) Yes No
DefaultRandomGenerator.DisableThreading No No
Custom seeded IProvideRandomValues Up to you Yes, if seeded
// Production: multi-threaded, non-deterministic
var umap = new Umap();

// Single-threaded (for example, in a shared service where you don't want
// one request to monopolise all CPUs)
var umap = new Umap(random: DefaultRandomGenerator.DisableThreading);

For deterministic results — typically used in unit tests — implement IProvideRandomValues with a seeded generator. See Reproducibility.

progressReporter

Pass an Action<float> and you will be called back with a value between 0.0 and 1.0 throughout the fit and step phases.

var umap = new Umap(progressReporter: progress =>
    Console.WriteLine($"{progress:P0}"));

For large datasets InitializeFit accounts for roughly 80% of total time, with the Step loop covering the remaining 20%. See Progress Reporting for a fuller treatment, including integration with IProgress<T>.

Putting it together

A typical "production" configuration for 3-D projection of pre-normalised embedding vectors with progress reporting:

var umap = new Umap(
    distance: Umap.DistanceFunctions.CosineForNormalizedVectors,
    dimensions: 3,
    numberOfNeighbors: 30,
    progressReporter: p => progress.Report(p));

var epochs = umap.InitializeFit(vectors);
for (var i = 0; i < epochs; i++)
{
    umap.Step();
}
© 2026 UMAP. All rights reserved.