Managing the model cache
FlorenceModelDownloader owns the on-disk location of the ONNX model files. By default it points at a directory of your choosing and fetches the Florence-2-base checkpoints from Hugging Face on first use.
For production deployments, you usually want one of:
- Pre-populate the cache in your container image or installer, so first-run is fast and offline.
- Share a single cache across many processes on the same machine.
- Use a custom model — a different size, a fine-tune, or a quantised variant.
The default behaviour
var modelSource = new FlorenceModelDownloader("./models");
await modelSource.DownloadModelsAsync();
On first call, this downloads the Florence-2-base ONNX files into ./models. Subsequent calls re-use the cached files and complete immediately.
Shipping models with your application
For air-gapped deployments or to avoid first-run download latency, ship the model files alongside your binaries:
myapp/
├── MyApp.dll
├── MyApp.exe
└── models/
├── decoder_model.onnx
├── encoder_model.onnx
├── embed_tokens.onnx
└── …
Point the downloader at the bundled folder:
string modelDir = Path.Combine(AppContext.BaseDirectory, "models");
var modelSource = new FlorenceModelDownloader(modelDir);
await modelSource.DownloadModelsAsync(); // no-op if already populated
DownloadModelsAsync is safe to call on a pre-populated cache — it verifies presence and skips downloads.
Sharing a cache across processes
A single model directory can be read by many Florence2Model instances — across processes, across containers (with a shared volume), across users. The files are read-only at inference time.
// In every process:
var modelSource = new FlorenceModelDownloader("/var/lib/florence2/models");
await modelSource.DownloadModelsAsync();
var model = new Florence2Model(modelSource);
Multiple concurrent first-time downloaders against the same directory may race — gate the initial download behind a per-host lock if that's a concern.
Using a custom checkpoint
Florence-2 is published in three sizes on Hugging Face:
microsoft/Florence-2-base(default in this library)microsoft/Florence-2-largemicrosoft/Florence-2-large-ft
To use a non-default checkpoint:
- Download or convert the model files to ONNX format yourself.
- Place them in a directory.
- Point
FlorenceModelDownloaderat that directory.
var modelSource = new FlorenceModelDownloader("./florence2-large");
// DownloadModelsAsync is still safe to call — it sees the files are there
await modelSource.DownloadModelsAsync();
var model = new Florence2Model(modelSource);
The library doesn't know or care which checkpoint produced the ONNX files — it speaks the same protocol regardless.
Verifying the cache
A quick check that the cache is populated:
string modelDir = "./models";
if (Directory.Exists(modelDir) &&
Directory.GetFiles(modelDir, "*.onnx").Length > 0)
{
Console.WriteLine("Cache populated.");
}
else
{
Console.WriteLine("Cache empty — will download on first use.");
}
This is useful as a startup health-check — it lets you fail fast on a missing cache before the first request rather than during it.
Storage footprint
| Checkpoint | Approximate size on disk |
|---|---|
| Florence-2-base | ~500 MB |
| Florence-2-large | ~1.6 GB |
| Florence-2-large-ft | ~1.6 GB |
Plan disk quotas accordingly — especially when bundling models into container images.
Common pitfalls
Don't ignore failed downloads
DownloadModelsAsync returns successfully even if the resulting cache is incomplete on some failure paths. Verify the directory contains the expected files before constructing Florence2Model if you want bulletproof startup.
Cache directory ownership
On Linux containers, ensure the user running the application has write permission on the cache directory the first time and read permission thereafter. Read-only volume mounts work fine after the cache is populated elsewhere.