Quick Start
Five steps from dotnet add to a generated image caption: download the model, build a Florence2Model, point it at an image, choose a task, and read the result.
If you haven't installed the package yet, see Installation:
dotnet add package Florence2
1. Download the model
FlorenceModelDownloader fetches the Florence-2-base ONNX checkpoints from Hugging Face on first use and caches them on disk. Subsequent runs reuse the cache.
using Florence2;
var modelSource = new FlorenceModelDownloader("./models");
await modelSource.DownloadModelsAsync();
You can also point the downloader at a pre-populated directory containing Florence-2-base ONNX models — see Managing the model cache.
2. Build the model
var model = new Florence2Model(modelSource);
Florence2Model is thread-safe for reads — share a single instance across requests. Allocate one per process, not one per call.
3. Load an image
The library accepts any seekable Stream containing a common raster format (PNG, JPEG, BMP, …). For an on-disk file:
using var image = File.OpenRead("car.jpg");
For an in-memory image, wrap your byte[] in a MemoryStream.
4. Pick a task and run inference
Run takes a TaskTypes value, the image stream, and — for grounding tasks only — a phrase. It returns a typed FlorenceResults you can read fields off directly.
var results = model.Run(TaskTypes.DETAILED_CAPTION, image);
Console.WriteLine(results.PureText);
// → "A red sedan parked on a cobblestone street under late-afternoon light."
For tasks that produce structured output — OCR, object detection, grounding — read the appropriate field instead of PureText. See Supported tasks for the full mapping.
5. Try a few tasks side by side
using Florence2;
using System.Text.Json;
var modelSource = new FlorenceModelDownloader("./models");
await modelSource.DownloadModelsAsync();
var model = new Florence2Model(modelSource);
foreach (var task in new[]
{
TaskTypes.CAPTION,
TaskTypes.DETAILED_CAPTION,
TaskTypes.OCR,
TaskTypes.OD,
})
{
using var image = File.OpenRead("car.jpg");
var results = model.Run(task, image);
Console.WriteLine($"--- {task} ---");
Console.WriteLine(JsonSerializer.Serialize(results, new JsonSerializerOptions { WriteIndented = true }));
}
Run that against any of your own images and Florence-2 will produce a caption, transcribed text, and bounding boxes — all from the same model.