FiftyOne Adds Local OCR Plugins for Datasets

Voxel51 has released new optical character recognition (OCR) plugins for FiftyOne, enabling developers to extract text from images and documents directly within their computer vision workflows. The plugins integrate popular OCR engines—Tesseract, EasyOCR, and PaddleOCR—allowing teams to process datasets locally without sending data to external APIs.

The Problem It Solves

Computer vision projects frequently involve images containing text, from street signs in autonomous driving datasets to receipts in document processing pipelines. Extracting this text traditionally requires switching between tools, exporting data to third-party services, or writing custom integration code. This fragmentation slows iteration and creates privacy concerns when sensitive data must leave local infrastructure.

FiftyOne’s OCR plugins eliminate these friction points by embedding text extraction directly into the dataset management workflow. Teams can now annotate images with detected text, filter datasets by text content, and visualize OCR results alongside other labels—all within the same interface used for model evaluation and data curation. This consolidation matters particularly for industries like healthcare and finance, where regulatory requirements prohibit cloud-based processing of certain document types.

How It Works

Each plugin wraps a different OCR engine with a consistent interface. Tesseract offers broad language support and runs efficiently on CPU, making it suitable for batch processing. EasyOCR provides neural network-based recognition with strong performance on challenging fonts and orientations. PaddleOCR delivers fast inference and excels at multilingual text detection in complex layouts.

The plugins return structured predictions containing detected text, bounding boxes, and confidence scores. These results integrate directly into FiftyOne’s data model as detections, allowing standard filtering and visualization operations. A typical workflow involves loading a dataset, applying an OCR plugin to generate text annotations, then using FiftyOne’s query language to find specific content or quality issues.

import fiftyone as fo
import fiftyone.zoo as foz

# Load a dataset
dataset = foz.load_zoo_dataset("quickstart")

# Apply Tesseract OCR plugin
dataset.apply_model(
    fo.Model.from_name("tesseract-ocr"),
    label_field="ocr_detections"
)

# Filter images containing specific text
view = dataset.filter_labels(
    "ocr_detections",
    fo.ViewField("label").contains("STOP", case=False)
)

session = fo.launch_app(view)

The plugins handle preprocessing automatically, including image normalization and text region detection. Users can configure language models, detection thresholds, and output formats through plugin parameters. Results persist in FiftyOne’s database, enabling reproducible analysis without re-running expensive OCR operations.

Setup Guide

Installation requires the FiftyOne library plus the desired OCR engine. For Tesseract:

pip install fiftyone
pip install pytesseract
# Install Tesseract binary (platform-specific)
# Ubuntu: sudo apt-get install tesseract-ocr
# macOS: brew install tesseract

For EasyOCR or PaddleOCR, install the respective Python packages. Each engine downloads language models on first use, which may require several hundred megabytes of storage.

After installation, plugins register automatically with FiftyOne’s model zoo. The fo.Model.from_name() method loads them by identifier. Custom configurations pass through the model initialization:

model = fo.Model.from_name(
    "easyocr",
    languages=["en", "es"],
    gpu=True
)
dataset.apply_model(model, label_field="multilingual_text")

Processing speed varies by engine and hardware. Tesseract handles thousands of images per hour on CPU. GPU-accelerated engines like EasyOCR achieve real-time performance on modern graphics cards. For large datasets, FiftyOne supports distributed processing through its integration with Apache Spark and similar frameworks.

Ecosystem

These plugins extend FiftyOne’s position as a unified platform for computer vision data operations. The tool already supports object detection, segmentation, and classification workflows from companies like Meta, Toyota, and NASA. Adding OCR capabilities addresses a gap in multimodal datasets where visual and textual information combine.

The local processing approach aligns with broader industry trends toward edge computing and data sovereignty. Organizations can now build document understanding pipelines that never transmit data externally, satisfying compliance requirements while maintaining the flexibility of modern ML tooling.

Integration with FiftyOne’s existing features creates compound capabilities. Teams can train custom OCR models using the platform’s active learning tools, evaluate performance across demographic slices, and debug failures through visual inspection—all within a single environment. The plugins are open source and available at https://github.com/voxel51/fiftyone-plugins, inviting community contributions for additional OCR engines or specialized preprocessing techniques.

FiftyOne Adds Local OCR Plugins for Datasets

FiftyOne Adds Local OCR Plugins for Datasets

The Problem It Solves

How It Works

Setup Guide

Ecosystem

Related Tips

Caveman: Slashing AI Development Time on Benchmarks

Abliteration: Surgical Removal of AI Safety Filters

AgentHandover: Auto-Generate AI Skills from Screen Use