Object detection

Florence-2 produces bounding boxes in three different modes — generic detection, dense region captioning, and open-vocabulary detection. All three return FlorenceResults.BoundingBoxes.

Generic detection (`OD`)

TaskTypes.OD runs Florence-2's built-in object detector and returns labelled boxes for every detected object.

using var image = File.OpenRead("kitchen.jpg");
var results = model.Run(TaskTypes.OD, image);

foreach (var entry in results.BoundingBoxes)
{
    Console.WriteLine($"--- {entry.Label} ---");
    foreach (var box in entry.BBoxes)
        Console.WriteLine($"    ({box.xmin:F0}, {box.ymin:F0}) → ({box.xmax:F0}, {box.ymax:F0})");
}

BoundingBoxes is an array of LabeledBoundingBoxes. Each entry groups all boxes that share a label — so BBoxes may contain multiple rectangles for the same class.

Coordinates are in pixel space of the original image, not the model's internal 0–999 space.

Dense region captioning

TaskTypes.DENSE_REGION_CAPTION returns many small regions, each captioned. Useful for image-grid UIs or for building a structured description of a complex scene.

using var image = File.OpenRead("crowd.jpg");
var results = model.Run(TaskTypes.DENSE_REGION_CAPTION, image);

Console.WriteLine($"{results.BoundingBoxes.Length} regions detected.");

foreach (var entry in results.BoundingBoxes.Take(10))
{
    var box = entry.BBoxes[0];
    Console.WriteLine($"\"{entry.Label}\" at ({box.xmin:F0}, {box.ymin:F0})");
}

Each entry is a small region with its own descriptive label (e.g. "a person wearing a red hat") — different from OD, which uses a fixed class vocabulary.

Region proposals (no labels)

TaskTypes.REGION_PROPOSAL returns bounding boxes without labels — useful as a first pass before running a more expensive labelling model on each region.

using var image = File.OpenRead("scene.jpg");
var results = model.Run(TaskTypes.REGION_PROPOSAL, image);

foreach (var entry in results.BoundingBoxes)
{
    foreach (var box in entry.BBoxes)
        Console.WriteLine($"({box.xmin:F0}, {box.ymin:F0}) → ({box.xmax:F0}, {box.ymax:F0})");
}

The Label field is typically empty for proposals.

Open-vocabulary detection

TaskTypes.OPEN_VOCABULARY_DETECTION accepts an arbitrary class prompt in textInput — not limited to the standard COCO-like classes that OD uses.

using var image = File.OpenRead("garage.jpg");

var results = model.Run(
    TaskTypes.OPEN_VOCABULARY_DETECTION,
    image,
    textInput: "vintage motorcycle");

foreach (var entry in results.BoundingBoxes)
{
    foreach (var box in entry.BBoxes)
        Console.WriteLine($"{entry.Label}: ({box.xmin:F0}, {box.ymin:F0}) → ({box.xmax:F0}, {box.ymax:F0})");
}

Open-vocabulary detection is the right tool when you need to look for unusual or domain-specific objects — "patient identification bracelet", "graffiti tag", "structural cracks in concrete" — that aren't in Florence-2's pre-trained class set.

Picking the right detection task

Question	Task
"Find every object in the image, however many there are."	`OD`
"Describe every interesting region in detail."	`DENSE_REGION_CAPTION`
"Give me boxes I can label later with a different model."	`REGION_PROPOSAL`
"Find anything matching this specific description."	`OPEN_VOCABULARY_DETECTION`

Drawing on the image

If you want to overlay boxes on the original image, take the pixel coordinates as-is — no rescaling needed. For example, with SixLabors.ImageSharp:

using var img = SixLabors.ImageSharp.Image.Load("kitchen.jpg");

foreach (var entry in results.BoundingBoxes)
{
    foreach (var b in entry.BBoxes)
    {
        var rect = new RectangleF(b.xmin, b.ymin, b.xmax - b.xmin, b.ymax - b.ymin);
        img.Mutate(ctx => ctx.Draw(Color.Red, 2f, rect));
    }
}

img.SaveAsPng("annotated.png");