Configuration
The Umap constructor takes six optional arguments. The defaults produce a 2-D cosine projection with thread-safe randomness — enough for most workloads — but every knob is there when you need it.
public Umap(
DistanceCalculation distance = null,
IProvideRandomValues random = null,
int dimensions = 2,
int numberOfNeighbors = 15,
int? customNumberOfEpochs = null,
ProgressReporter progressReporter = null)
Constructor arguments
| Argument | Type | Default | Description |
|---|---|---|---|
distance |
DistanceCalculation |
Umap.DistanceFunctions.Cosine |
Metric used to compute distances between input vectors. See Distance Functions. |
random |
IProvideRandomValues |
DefaultRandomGenerator.Instance |
Random number source. Controls reproducibility and whether the SGD step runs in parallel. See Parallelization and Reproducibility. |
dimensions |
int |
2 |
Number of output dimensions. Typically 2 or 3. See 3-D Projections. |
numberOfNeighbors |
int |
15 |
Local neighborhood size used to build the fuzzy simplicial set. Smaller values preserve local structure; larger values preserve more global structure. |
customNumberOfEpochs |
int? |
null |
Override the recommended number of SGD epochs. Must be positive when set. |
progressReporter |
Action<float> (delegate) |
null |
Optional callback that receives a 0.0 → 1.0 progress value during InitializeFit and Step. See Progress Reporting. |
Defaults summary
Calling new Umap() with no arguments is equivalent to:
var umap = new Umap(
distance: Umap.DistanceFunctions.Cosine,
random: DefaultRandomGenerator.Instance,
dimensions: 2,
numberOfNeighbors: 15,
customNumberOfEpochs: null,
progressReporter: null);
dimensions
dimensions is the size of every row in the returned embedding. UMAP is mostly used with 2 (for scatter plots) or 3 (for interactive 3-D views), but higher values are valid if you want to feed the result into downstream models. The optimisation cost scales with this value.
// 3-D projection
var umap = new Umap(dimensions: 3);
numberOfNeighbors
numberOfNeighbors (often written n_neighbors in the UMAP literature) controls the balance between local and global structure:
- Small values (
5 – 15) — preserve fine local structure, separate small clusters cleanly, can fragment large ones. - Larger values (
30 – 100) — preserve global structure, produce smoother overall shapes, may bleed nearby clusters together.
The default 15 is a good general choice. Increase it when the projection looks shattered; decrease it when distinct categories blur together.
var umap = new Umap(numberOfNeighbors: 30);
customNumberOfEpochs
By default InitializeFit returns a heuristic epoch count based on dataset size (see Basic Usage). You can override this with customNumberOfEpochs:
var umap = new Umap(customNumberOfEpochs: 1000);
var epochs = umap.InitializeFit(vectors); // == 1000
More epochs ≈ a more refined embedding at the cost of wall time. There is steep diminishing returns past the default; doubling the count rarely changes the visualisation noticeably.
Must be positive
Passing 0 or a negative value throws ArgumentOutOfRangeException.
distance
Pick a metric that matches how your input vectors were produced:
Cosine(default) — general-purpose, good for most text/embedding workloads.CosineForNormalizedVectors— faster equivalent when your vectors are already unit-normalised.Euclidean— use when magnitude matters (raw pixel values, geometric coordinates).- Custom delegate — supply your own
float (float[], float[])function.
See the dedicated Distance Functions page for examples and benchmarks.
random
random controls two things at once:
- Reproducibility — whether two runs with the same input produce the same output.
- Parallelism — UMAP only multi-threads the SGD step if
random.IsThreadSafeistrue.
| Generator | Thread-safe | Reproducible |
|---|---|---|
DefaultRandomGenerator.Instance (default) |
Yes | No |
DefaultRandomGenerator.DisableThreading |
No | No |
Custom seeded IProvideRandomValues |
Up to you | Yes, if seeded |
// Production: multi-threaded, non-deterministic
var umap = new Umap();
// Single-threaded (for example, in a shared service where you don't want
// one request to monopolise all CPUs)
var umap = new Umap(random: DefaultRandomGenerator.DisableThreading);
For deterministic results — typically used in unit tests — implement IProvideRandomValues with a seeded generator. See Reproducibility.
progressReporter
Pass an Action<float> and you will be called back with a value between 0.0 and 1.0 throughout the fit and step phases.
var umap = new Umap(progressReporter: progress =>
Console.WriteLine($"{progress:P0}"));
For large datasets InitializeFit accounts for roughly 80% of total time, with the Step loop covering the remaining 20%. See Progress Reporting for a fuller treatment, including integration with IProgress<T>.
Putting it together
A typical "production" configuration for 3-D projection of pre-normalised embedding vectors with progress reporting:
var umap = new Umap(
distance: Umap.DistanceFunctions.CosineForNormalizedVectors,
dimensions: 3,
numberOfNeighbors: 30,
progressReporter: p => progress.Report(p));
var epochs = umap.InitializeFit(vectors);
for (var i = 0; i < epochs; i++)
{
umap.Step();
}