Object detection
Florence-2 produces bounding boxes in three different modes — generic detection, dense region captioning, and open-vocabulary detection. All three return FlorenceResults.BoundingBoxes.
Generic detection (OD)
TaskTypes.OD runs Florence-2's built-in object detector and returns labelled boxes for every detected object.
using var image = File.OpenRead("kitchen.jpg");
var results = model.Run(TaskTypes.OD, image);
foreach (var entry in results.BoundingBoxes)
{
Console.WriteLine($"--- {entry.Label} ---");
foreach (var box in entry.BBoxes)
Console.WriteLine($" ({box.xmin:F0}, {box.ymin:F0}) → ({box.xmax:F0}, {box.ymax:F0})");
}
BoundingBoxes is an array of LabeledBoundingBoxes. Each entry groups all boxes that share a label — so BBoxes may contain multiple rectangles for the same class.
Coordinates are in pixel space of the original image, not the model's internal 0–999 space.
Dense region captioning
TaskTypes.DENSE_REGION_CAPTION returns many small regions, each captioned. Useful for image-grid UIs or for building a structured description of a complex scene.
using var image = File.OpenRead("crowd.jpg");
var results = model.Run(TaskTypes.DENSE_REGION_CAPTION, image);
Console.WriteLine($"{results.BoundingBoxes.Length} regions detected.");
foreach (var entry in results.BoundingBoxes.Take(10))
{
var box = entry.BBoxes[0];
Console.WriteLine($"\"{entry.Label}\" at ({box.xmin:F0}, {box.ymin:F0})");
}
Each entry is a small region with its own descriptive label (e.g. "a person wearing a red hat") — different from OD, which uses a fixed class vocabulary.
Region proposals (no labels)
TaskTypes.REGION_PROPOSAL returns bounding boxes without labels — useful as a first pass before running a more expensive labelling model on each region.
using var image = File.OpenRead("scene.jpg");
var results = model.Run(TaskTypes.REGION_PROPOSAL, image);
foreach (var entry in results.BoundingBoxes)
{
foreach (var box in entry.BBoxes)
Console.WriteLine($"({box.xmin:F0}, {box.ymin:F0}) → ({box.xmax:F0}, {box.ymax:F0})");
}
The Label field is typically empty for proposals.
Open-vocabulary detection
TaskTypes.OPEN_VOCABULARY_DETECTION accepts an arbitrary class prompt in textInput — not limited to the standard COCO-like classes that OD uses.
using var image = File.OpenRead("garage.jpg");
var results = model.Run(
TaskTypes.OPEN_VOCABULARY_DETECTION,
image,
textInput: "vintage motorcycle");
foreach (var entry in results.BoundingBoxes)
{
foreach (var box in entry.BBoxes)
Console.WriteLine($"{entry.Label}: ({box.xmin:F0}, {box.ymin:F0}) → ({box.xmax:F0}, {box.ymax:F0})");
}
Open-vocabulary detection is the right tool when you need to look for unusual or domain-specific objects — "patient identification bracelet", "graffiti tag", "structural cracks in concrete" — that aren't in Florence-2's pre-trained class set.
Picking the right detection task
| Question | Task |
|---|---|
| "Find every object in the image, however many there are." | OD |
| "Describe every interesting region in detail." | DENSE_REGION_CAPTION |
| "Give me boxes I can label later with a different model." | REGION_PROPOSAL |
| "Find anything matching this specific description." | OPEN_VOCABULARY_DETECTION |
Drawing on the image
If you want to overlay boxes on the original image, take the pixel coordinates as-is — no rescaling needed. For example, with SixLabors.ImageSharp:
using var img = SixLabors.ImageSharp.Image.Load("kitchen.jpg");
foreach (var entry in results.BoundingBoxes)
{
foreach (var b in entry.BBoxes)
{
var rect = new RectangleF(b.xmin, b.ymin, b.xmax - b.xmin, b.ymax - b.ymin);
img.Mutate(ctx => ctx.Draw(Color.Red, 2f, rect));
}
}
img.SaveAsPng("annotated.png");