OCR
Florence-2 reads text from images in two modes — bulk plain-text extraction and per-region OCR with quad-box coordinates. Pick the one that matches how you'll consume the result downstream.
Plain OCR
TaskTypes.OCR returns every recognised word as a single concatenated string in FlorenceResults.PureText. There is no per-region structure — useful for documents or screenshots when you just want the text content.
using var image = File.OpenRead("receipt.jpg");
var results = model.Run(TaskTypes.OCR, image);
Console.WriteLine(results.PureText);
// → "GREENGROCER 12 OAK LANE total: 14.32"
OCR with regions
TaskTypes.OCR_WITH_REGION returns the same text plus a quad-box (four corner points) around each recognised region — useful for highlighting, click-targeting, or downstream layout-aware parsing.
using var image = File.OpenRead("receipt.jpg");
var results = model.Run(TaskTypes.OCR_WITH_REGION, image);
foreach (var box in results.OCRBBox)
{
Console.WriteLine($"\"{box.Text}\"");
foreach (var corner in box.QuadBox)
Console.WriteLine($" ({corner.x:F0}, {corner.y:F0})");
}
OCRBBox is an array of LabeledOCRBox. Each element has:
Text— the recognised text in that region.QuadBox— fourCoordinates<float>points, clockwise from the top-left, in pixel coordinates of the original image.
The quad-box is a quadrilateral, not an axis-aligned rectangle — it follows rotated or skewed text.
OCR a specific region only
If you already know where the text is (from a manual annotation, an object detection step, or a UI hit-test), use TaskTypes.REGION_TO_OCR to read text from just that area. Pass the region in textInput using <loc_…> tokens.
// Region encoded as <loc_x1><loc_y1><loc_x2><loc_y2> in 0..999 space
string region = "<loc_120><loc_400><loc_780><loc_460>";
using var image = File.OpenRead("receipt.jpg");
var results = model.Run(TaskTypes.REGION_TO_OCR, image, textInput: region);
Console.WriteLine(results.PureText);
Florence-2 quantises image coordinates into a 1000-bin grid (0–999), so you need to map pixel coordinates by the image dimensions before formatting the <loc_…> tokens. The library's BoxQuantizer exposes the quantisation; emit the tokens directly with string formatting once you have the bin indices.
Tips
- Pre-rotate scanned documents. Florence-2 handles small skew well, but heavily rotated text (90° or more) confuses it. Detect orientation first if you can.
- Resolution matters. Florence-2-base preprocesses images to 768×768. Tiny text becomes unreadable after the resize. For dense documents, consider tiling and running OCR per tile.
- Numbers and codes can drift. Generative OCR sometimes "corrects" unfamiliar product codes or numbers towards more common patterns. For structured fields (SKUs, invoice IDs), validate with a regex or a check-digit pass.