Available on NuGet Read more

Florence2-Sharp

A .NET wrapper for Microsoft's Florence-2 vision model — image captioning, OCR, object detection, and phrase grounding from C#, with models loaded locally.

Install via NuGet Quick Start

What is Florence2-Sharp?

Florence2-Sharp gives .NET developers a clean C# API for Microsoft's Florence-2 vision-language model — without needing Python, transformers, or the original reference implementation. The library handles model download, tokenisation, image preprocessing, and post-processing, so calling Florence-2 looks like any other strongly-typed .NET method.

A single model checkpoint covers more than a dozen vision tasks. Pick the task from the TaskTypes enum, stream the image in, and Florence2-Sharp returns structured results — captions, bounding boxes, polygons, or quad-boxed OCR text.

Project on GitHub Florence-2 on Hugging Face Florence-2 paper (arXiv)

A first taste

using Florence2;

var modelSource = new FlorenceModelDownloader("./models");
await modelSource.DownloadModelsAsync();

var model = new Florence2Model(modelSource);

using var image = File.OpenRead("car.jpg");
var results = model.Run(TaskTypes.DETAILED_CAPTION, image);

Console.WriteLine(results.PureText);
// → "A red sedan parked on a cobblestone street under late-afternoon light."

Swap TaskTypes.DETAILED_CAPTION for OCR, OD, CAPTION_TO_PHRASE_GROUNDING, or any of the other supported tasks and the same call returns the appropriate structured result.

Why Florence2-Sharp?

Runs entirely locally

ONNX models execute on-device. No outbound requests at inference time, no API keys, no per-image cost — useful when working with sensitive imagery.

One model, many tasks

Captioning, OCR, region OCR, object detection, dense region captioning, phrase grounding, segmentation, and more — all from the same Florence2Model instance. See Supported tasks.

Automatic model download

FlorenceModelDownloader fetches the ONNX checkpoints on first use and caches them on disk. Point it at any directory or pre-populated cache.

Structured results

Every task returns a typed FlorenceResults with PureText, BoundingBoxes, OCRBBox, and Polygons — no JSON parsing, no string scraping.

Tiny surface area

Two classes you'll touch most days: FlorenceModelDownloader and Florence2Model. The rest is preprocessing and tokenisation that you don't have to think about.

Bring your own models

The downloader is optional — pass any folder containing Florence-2-base ONNX models and the library will load them directly.

Pick your path

Get Started

Install the NuGet package, download the model, and run your first inference.

Core Concepts

The Florence-2 model, the task family, and how prompts map to outputs.

Guides

Captioning, OCR, object detection, phrase grounding, and region-level tasks — task by task.

Advanced Topics

Managing the model cache, GPU execution, and performance tuning.

Source & Issues

Browse the source, file issues, and check the latest releases.

Requirements

Requirement	Minimum	Notes
.NET	.NET 6.0+	The library uses modern `Span<T>` and `System.IO.Stream` APIs.
ONNX Runtime	Bundled via NuGet	CPU execution provider works out of the box; CUDA and DirectML available via the ONNX Runtime extension packages.
Disk	~500 MB	Florence-2-base ONNX checkpoints, cached on first run.
Network	First run only	Used to download the model from Hugging Face. Subsequent runs are fully offline.

Florence2-Sharp is published under MIT and is safe to ship inside commercial applications.

Learn more about Florence-2

Florence-2 on Hugging Face — model card, sample notebooks, and licensing.
Florence-2 paper (arXiv) — "Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks".
ONNX Runtime — the inference engine underpinning this library.