Florence2-Sharp
A .NET wrapper for Microsoft's Florence-2 vision model — image captioning, OCR, object detection, and phrase grounding from C#, with models loaded locally.
What is Florence2-Sharp?
Florence2-Sharp gives .NET developers a clean C# API for Microsoft's Florence-2 vision-language model — without needing Python, transformers, or the original reference implementation. The library handles model download, tokenisation, image preprocessing, and post-processing, so calling Florence-2 looks like any other strongly-typed .NET method.
A single model checkpoint covers more than a dozen vision tasks. Pick the task from the TaskTypes enum, stream the image in, and Florence2-Sharp returns structured results — captions, bounding boxes, polygons, or quad-boxed OCR text.
Project on GitHub Florence-2 on Hugging Face Florence-2 paper (arXiv)
A first taste
using Florence2;
var modelSource = new FlorenceModelDownloader("./models");
await modelSource.DownloadModelsAsync();
var model = new Florence2Model(modelSource);
using var image = File.OpenRead("car.jpg");
var results = model.Run(TaskTypes.DETAILED_CAPTION, image);
Console.WriteLine(results.PureText);
// → "A red sedan parked on a cobblestone street under late-afternoon light."
Swap TaskTypes.DETAILED_CAPTION for OCR, OD, CAPTION_TO_PHRASE_GROUNDING, or any of the other supported tasks and the same call returns the appropriate structured result.
Why Florence2-Sharp?
Runs entirely locally
ONNX models execute on-device. No outbound requests at inference time, no API keys, no per-image cost — useful when working with sensitive imagery.
One model, many tasks
Captioning, OCR, region OCR, object detection, dense region captioning, phrase grounding, segmentation, and more — all from the same Florence2Model instance. See Supported tasks.
Automatic model download
FlorenceModelDownloader fetches the ONNX checkpoints on first use and caches them on disk. Point it at any directory or pre-populated cache.
Structured results
Every task returns a typed FlorenceResults with PureText, BoundingBoxes, OCRBBox, and Polygons — no JSON parsing, no string scraping.
Tiny surface area
Two classes you'll touch most days: FlorenceModelDownloader and Florence2Model. The rest is preprocessing and tokenisation that you don't have to think about.
Bring your own models
The downloader is optional — pass any folder containing Florence-2-base ONNX models and the library will load them directly.
Pick your path
Requirements
| Requirement | Minimum | Notes |
|---|---|---|
| .NET | .NET 6.0+ | The library uses modern Span<T> and System.IO.Stream APIs. |
| ONNX Runtime | Bundled via NuGet | CPU execution provider works out of the box; CUDA and DirectML available via the ONNX Runtime extension packages. |
| Disk | ~500 MB | Florence-2-base ONNX checkpoints, cached on first run. |
| Network | First run only | Used to download the model from Hugging Face. Subsequent runs are fully offline. |
Florence2-Sharp is published under MIT and is safe to ship inside commercial applications.
Learn more about Florence-2
- Florence-2 on Hugging Face — model card, sample notebooks, and licensing.
- Florence-2 paper (arXiv) — "Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks".
- ONNX Runtime — the inference engine underpinning this library.