Plug and Play Active Learning for Object Detection

Chenhongyi Yang; Lichao Huang; Elliot J. Crowley

Plug and Play Active Learning for Object Detection

Chenhongyi Yang, Lichao Huang, Elliot J. Crowley

TL;DR

PPAL addresses the high annotation cost of object detection by introducing a plug-and-play two-stage active learning method that does not modify detector architectures. It combines Difficulty Calibrated Uncertainty Sampling to prioritize uncertain instances in challenging categories and Category Conditioned Matching Similarity to drive a diversity-based query selection among multi-object images, using a CCMS-guided k-centre/k-means++ process. The approach achieves state-of-the-art performance on COCO and Pascal VOC across multiple detectors, including SSD and semi-supervised settings, demonstrating strong generalization and data-efficiency without architectural changes. This work offers a practical, detector-agnostic AL framework with accessible code, enabling broader adoption for reducing labeling effort in object detection pipelines.

Abstract

Annotating datasets for object detection is an expensive and time-consuming endeavor. To minimize this burden, active learning (AL) techniques are employed to select the most informative samples for annotation within a constrained "annotation budget". Traditional AL strategies typically rely on model uncertainty or sample diversity for query sampling, while more advanced methods have focused on developing AL-specific object detector architectures to enhance performance. However, these specialized approaches are not readily adaptable to different object detectors due to the significant engineering effort required for integration. To overcome this challenge, we introduce Plug and Play Active Learning (PPAL), a simple and effective AL strategy for object detection. PPAL is a two-stage method comprising uncertainty-based and diversity-based sampling phases. In the first stage, our Difficulty Calibrated Uncertainty Sampling leverage a category-wise difficulty coefficient that combines both classification and localisation difficulties to re-weight instance uncertainties, from which we sample a candidate pool for the subsequent diversity-based sampling. In the second stage, we propose Category Conditioned Matching Similarity to better compute the similarities of multi-instance images as ensembles of their instance similarities, which is used by the k-Means++ algorithm to sample the final AL queries. PPAL makes no change to model architectures or detector training pipelines; hence it can be easily generalized to different object detectors. We benchmark PPAL on the MS-COCO and Pascal VOC datasets using different detector architectures and show that our method outperforms prior work by a large margin. Code is available at https://github.com/ChenhongyiYang/PPAL

Plug and Play Active Learning for Object Detection

TL;DR

Abstract

Paper Structure (12 sections, 7 equations, 9 figures, 4 tables)

This paper contains 12 sections, 7 equations, 9 figures, 4 tables.

Introduction
Related Work
Method
Problem Statement
Difficulty Calibrated Uncertainty Sampling
Diversity Sampling for Multi-instance Images
Experiments
Experiment Settings
Main Results
Ablation Studies and Discussions
Conclusion
Acknowledgements.

Figures (9)

Figure 1: An overview of our two-stage PPAL. In the first Difficulty Calibrated Uncertainty Sampling stage, the objects' uncertainties are re-weighted with the difficulty coefficients that take both classification and localisation into account, and a candidate pool of images, which the model is mostly uncertain on, are sampled. In the second diversity-based stage, we run a modified kmeans++ algorithm using the proposed Category Conditioned Matching Similarity (CCMS) to select a set of representative images as active learning queries for the next round of annotation.
Figure 2: Illustration of how the category-wise difficulty coefficients correspond to the evaluated detection APs on Pascal VOC at each active learning round, in which the difficulty coefficients are sorted in descending order. Objects in categories with high-difficulty coefficients are harder to be detected than those in categories with low-difficulty coefficients.
Figure 3: Comparison of global similarity and our CCMS. The global similarity is computed using the averaged image feature maps, failing to capture the fine-grained spatial information of multi-instance images. On the other hand, in CCMS, each object in an image finds its most similar counterpart with the same category in another image to compute similarities. Then image-wise similarity is computed by averaging the object similarities.
Figure 4: Comparison between the proposed method and the state-of-the-art active learning algorithm for object detection in three different benchmark settings. (a) RetinaNet on COCO; (b) RetinaNet of Pascal VOC; (c) Faster R-CNN on COCO.
Figure 5: Active learning on Pascal VOC and COCO using (a) Anchor-free FCOS; (b) Anchor-based ATSS; (c) Anchor-based DDOD.
...and 4 more figures

Plug and Play Active Learning for Object Detection

TL;DR

Abstract

Plug and Play Active Learning for Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (9)