Filtering
Plain k-NN search returns the items closest to the query — without regard for any business-level predicate. Real workloads usually need both: "the 10 nearest neighbours that match this filter." HNSW-Sharp supports two strategies.
Strategy 1 — query-time predicate
Pass a filterItem delegate to KNNSearch. The library evaluates it on each candidate during the walk and discards anything that returns false.
var hits = graph.KNNSearch(query, k: 10,
filterItem: doc => doc.Tenant == currentTenant && !doc.IsArchived);
This works well when the filter accepts a large fraction of items — anywhere from 100% down to about 20–30%. The graph stays mostly relevant; the predicate just trims candidates.
It breaks down when the filter is very selective. If only 1% of items match, most of the candidates HNSW walks to are filtered out, the search runs out of work, and you get back fewer than k results — or you have to crank EfSearch so high that you've lost the speed advantage over bruteforce.
Strategy 2 — ACORN-γ at build time
For selective filters, build the graph with OptimizeForFiltering = true. This activates ACORN-γ, a construction strategy designed for filtered nearest-neighbor search. ACORN keeps more diverse edges at layer 0 so the graph stays well-connected even after most nodes are filtered out.
var parameters = new SmallWorldParameters
{
M = 16,
LevelLambda = 1 / Math.Log(16),
EfSearch = 64,
OptimizeForFiltering = true, // ACORN-γ
Gamma = 3, // neighbour expansion factor
Mb = 10, // layer-0 compression
};
var graph = new SmallWorld<Doc, float>(
distance: DocDistance,
generator: DefaultRandomGenerator.Instance,
parameters: parameters);
graph.AddItems(corpus);
var hits = graph.KNNSearch(query, k: 10, filterItem: HardPredicate);
Gamma (γ) is the main ACORN dial — higher values build a denser graph that survives more selective filters at the cost of memory and build time. Gamma = 3 to 5 is a reasonable starting range.
Mb controls layer-0 compression. Higher values make layer 0 less dense and recover memory at the cost of recall under filters.
ACORN is build-time
OptimizeForFiltering must be set before calling AddItems. You can't switch a built graph between standard and ACORN modes — rebuild if you need to change strategies.
Choosing between the two
| Filter selectivity | Strategy |
|---|---|
| Accepts > 30% of items | Strategy 1 — query-time predicate, no ACORN. |
| Accepts 5–30% | Strategy 1 with a higher EfSearch, or Strategy 2 with low Gamma. |
| Accepts < 5% | Strategy 2 — ACORN-γ, tune Gamma upward. |
| Accepts < 0.1% | Bruteforce. Even ACORN starts to lose at this point. |
If you can't predict the selectivity in advance, default to Strategy 1 and only build with OptimizeForFiltering = true if you measure recall problems.
Combining filters with cancellation
When a filter is very selective, the walk can spend a long time looking for matches. Pair filterItem with a CancellationToken to bound the latency:
using var cts = new CancellationTokenSource(TimeSpan.FromMilliseconds(100));
try
{
var hits = graph.KNNSearch(query, 10,
filterItem: HardPredicate,
cancellationToken: cts.Token);
}
catch (OperationCanceledException)
{
// partial / no results — handle as appropriate
}
Further reading
- ACORN paper (arXiv) — "ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data."