Plug and Play Active Learning for Object Detection
Chenhongyi Yang, Lichao Huang, Elliot J. Crowley
TL;DR
PPAL addresses the high annotation cost of object detection by introducing a plug-and-play two-stage active learning method that does not modify detector architectures. It combines Difficulty Calibrated Uncertainty Sampling to prioritize uncertain instances in challenging categories and Category Conditioned Matching Similarity to drive a diversity-based query selection among multi-object images, using a CCMS-guided k-centre/k-means++ process. The approach achieves state-of-the-art performance on COCO and Pascal VOC across multiple detectors, including SSD and semi-supervised settings, demonstrating strong generalization and data-efficiency without architectural changes. This work offers a practical, detector-agnostic AL framework with accessible code, enabling broader adoption for reducing labeling effort in object detection pipelines.
Abstract
Annotating datasets for object detection is an expensive and time-consuming endeavor. To minimize this burden, active learning (AL) techniques are employed to select the most informative samples for annotation within a constrained "annotation budget". Traditional AL strategies typically rely on model uncertainty or sample diversity for query sampling, while more advanced methods have focused on developing AL-specific object detector architectures to enhance performance. However, these specialized approaches are not readily adaptable to different object detectors due to the significant engineering effort required for integration. To overcome this challenge, we introduce Plug and Play Active Learning (PPAL), a simple and effective AL strategy for object detection. PPAL is a two-stage method comprising uncertainty-based and diversity-based sampling phases. In the first stage, our Difficulty Calibrated Uncertainty Sampling leverage a category-wise difficulty coefficient that combines both classification and localisation difficulties to re-weight instance uncertainties, from which we sample a candidate pool for the subsequent diversity-based sampling. In the second stage, we propose Category Conditioned Matching Similarity to better compute the similarities of multi-instance images as ensembles of their instance similarities, which is used by the k-Means++ algorithm to sample the final AL queries. PPAL makes no change to model architectures or detector training pipelines; hence it can be easily generalized to different object detectors. We benchmark PPAL on the MS-COCO and Pascal VOC datasets using different detector architectures and show that our method outperforms prior work by a large margin. Code is available at https://github.com/ChenhongyiYang/PPAL
