Managing the model cache

FlorenceModelDownloader owns the on-disk location of the ONNX model files. By default it points at a directory of your choosing and fetches the Florence-2-base checkpoints from Hugging Face on first use.

For production deployments, you usually want one of:

Pre-populate the cache in your container image or installer, so first-run is fast and offline.
Share a single cache across many processes on the same machine.
Use a custom model — a different size, a fine-tune, or a quantised variant.

The default behaviour

var modelSource = new FlorenceModelDownloader("./models");
await modelSource.DownloadModelsAsync();

On first call, this downloads the Florence-2-base ONNX files into ./models. Subsequent calls re-use the cached files and complete immediately.

Shipping models with your application

For air-gapped deployments or to avoid first-run download latency, ship the model files alongside your binaries:

myapp/
├── MyApp.dll
├── MyApp.exe
└── models/
    ├── decoder_model.onnx
    ├── encoder_model.onnx
    ├── embed_tokens.onnx
    └── …

Point the downloader at the bundled folder:

string modelDir = Path.Combine(AppContext.BaseDirectory, "models");
var modelSource = new FlorenceModelDownloader(modelDir);
await modelSource.DownloadModelsAsync();  // no-op if already populated

DownloadModelsAsync is safe to call on a pre-populated cache — it verifies presence and skips downloads.

A single model directory can be read by many Florence2Model instances — across processes, across containers (with a shared volume), across users. The files are read-only at inference time.

// In every process:
var modelSource = new FlorenceModelDownloader("/var/lib/florence2/models");
await modelSource.DownloadModelsAsync();
var model = new Florence2Model(modelSource);

Multiple concurrent first-time downloaders against the same directory may race — gate the initial download behind a per-host lock if that's a concern.

Using a custom checkpoint

Florence-2 is published in three sizes on Hugging Face:

microsoft/Florence-2-base (default in this library)
microsoft/Florence-2-large
microsoft/Florence-2-large-ft

To use a non-default checkpoint:

Download or convert the model files to ONNX format yourself.
Place them in a directory.
Point FlorenceModelDownloader at that directory.

var modelSource = new FlorenceModelDownloader("./florence2-large");
// DownloadModelsAsync is still safe to call — it sees the files are there
await modelSource.DownloadModelsAsync();

var model = new Florence2Model(modelSource);

The library doesn't know or care which checkpoint produced the ONNX files — it speaks the same protocol regardless.

Verifying the cache

A quick check that the cache is populated:

string modelDir = "./models";

if (Directory.Exists(modelDir) &&
    Directory.GetFiles(modelDir, "*.onnx").Length > 0)
{
    Console.WriteLine("Cache populated.");
}
else
{
    Console.WriteLine("Cache empty — will download on first use.");
}

This is useful as a startup health-check — it lets you fail fast on a missing cache before the first request rather than during it.

Storage footprint

Checkpoint	Approximate size on disk
Florence-2-base	~500 MB
Florence-2-large	~1.6 GB
Florence-2-large-ft	~1.6 GB

Plan disk quotas accordingly — especially when bundling models into container images.

Common pitfalls

Don't ignore failed downloads

DownloadModelsAsync returns successfully even if the resulting cache is incomplete on some failure paths. Verify the directory contains the expected files before constructing Florence2Model if you want bulletproof startup.

Cache directory ownership

On Linux containers, ensure the user running the application has write permission on the cache directory the first time and read permission thereafter. Read-only volume mounts work fine after the cache is populated elsewhere.