Serialization
Building an HNSW graph over millions of vectors takes time. Building it twice — every time the process restarts — is wasted time. SerializeGraph and DeserializeGraph let you build once and reload in seconds.
Serializing
SerializeGraph(Stream) writes the entire graph structure (parameters and edges) to a Stream:
using var fs = File.Create("index.hnsw");
graph.SerializeGraph(fs);
That's the whole API. The output is a binary MessagePack blob with a small header (HNSW) so corrupted files can be detected on load.
The format does not include the vectors themselves. This is by design — vectors are usually held by the application in a more convenient format (a List<>, a database, a memory-mapped file). Re-supplying them on load also lets you upgrade the storage format without invalidating built indexes.
Deserializing
DeserializeGraph is a static method on the type. It takes the original items, the same distance function used at build time, a random generator, and the input stream:
using var fs = File.OpenRead("index.hnsw");
var (graph, missing) = SmallWorld<float[], float>.DeserializeGraph(
items: vectors, // same vectors, in the same order
distance: CosineDistance.SIMDForUnits, // same as at build time
generator: DefaultRandomGenerator.Instance,
stream: fs);
if (missing.Length > 0)
{
Console.WriteLine($"{missing.Length} items missing — graph references vectors not in supplied list.");
}
The missing array reports any items the file references that weren't supplied by the caller — typically zero. A non-empty missing array is a sign that the vector storage drifted out of sync with the index file.
Round-trip rules
For a clean reload, three things must match between the build and the load:
- Same distance function. The serialized file does not name it — the caller is responsible for passing the same one. Passing a different metric on reload produces a syntactically valid graph that returns wrong neighbours.
- Same vectors, in the same order. HNSW refers to items by integer ID, which equals the position in the items list. Re-ordering the input shifts every ID.
- Same item type. Generic instantiations are not serialized —
SmallWorld<float[], float>andSmallWorld<MyVec, float>are different graphs.
If you can't guarantee these, version your storage and rebuild on mismatch.
A startup pattern
const string IndexPath = "index.hnsw";
SmallWorld<float[], float> LoadOrBuild(float[][] vectors)
{
if (File.Exists(IndexPath))
{
using var fs = File.OpenRead(IndexPath);
var (graph, missing) = SmallWorld<float[], float>.DeserializeGraph(
vectors,
CosineDistance.SIMDForUnits,
DefaultRandomGenerator.Instance,
fs);
if (missing.Length == 0)
return graph;
// fall through to rebuild
}
var fresh = new SmallWorld<float[], float>(
CosineDistance.SIMDForUnits,
DefaultRandomGenerator.Instance,
new SmallWorldParameters { M = 16, EfSearch = 64, LevelLambda = 1 / Math.Log(16) });
fresh.AddItems(vectors);
using (var fs = File.Create(IndexPath))
fresh.SerializeGraph(fs);
return fresh;
}
This is the canonical "ship a pre-built index alongside the binary, fall back to building from data if anything looks wrong" pattern.
Streaming to anything
SerializeGraph and DeserializeGraph take any Stream. That means you can ship the index over the network, write it into a blob store, or compress it:
using var ms = new MemoryStream();
graph.SerializeGraph(ms);
using var gz = new GZipStream(File.Create("index.hnsw.gz"), CompressionLevel.Optimal);
ms.Position = 0;
ms.CopyTo(gz);
Compression typically saves 30–50% on HNSW indexes.
Common pitfalls
Save the parameters too
SerializeGraph preserves graph structure but DeserializeGraph resets InitialDistanceCacheSize to 0 — there's no value in pre-allocating a build-time cache for a pre-built graph. All other parameters survive the round trip.
Verify on reload
Wrap the call in a try/catch around InvalidDataException. The library validates the header (HNSW) — a corrupt or truncated file fails fast at that step rather than later during search.