Table of Contents
Fetching ...

Aligning Data Selection with Performance: Performance-driven Reinforcement Learning for Active Learning in Object Detection

Zhixuan Liang, Xingyu Zeng, Rui Zhao, Ping Luo

TL;DR

This paper introduces Mean-AP Guided Reinforced Active Learning for Object Detection (MGRAL), a novel approach that leverages the concept of expected model output changes as informativeness for deep detection networks, directly optimizing the sampling strategy using mAP.

Abstract

Active learning strategies aim to train high-performance models with minimal labeled data by selecting the most informative instances for labeling. However, existing methods for assessing data informativeness often fail to align directly with task model performance metrics, such as mean average precision (mAP) in object detection. This paper introduces Mean-AP Guided Reinforced Active Learning for Object Detection (MGRAL), a novel approach that leverages the concept of expected model output changes as informativeness for deep detection networks, directly optimizing the sampling strategy using mAP. MGRAL employs a reinforcement learning agent based on LSTM architecture to efficiently navigate the combinatorial challenge of batch sample selection and the non-differentiable nature between performance and selected batches. The agent optimizes selection using policy gradient with mAP improvement as the reward signal. To address the computational intensity of mAP estimation with unlabeled samples, we implement fast look-up tables, ensuring real-world feasibility. We evaluate MGRAL on PASCAL VOC and MS COCO benchmarks across various backbone architectures. Our approach demonstrates strong performance, establishing a new paradigm in reinforcement learning-based active learning for object detection.

Aligning Data Selection with Performance: Performance-driven Reinforcement Learning for Active Learning in Object Detection

TL;DR

This paper introduces Mean-AP Guided Reinforced Active Learning for Object Detection (MGRAL), a novel approach that leverages the concept of expected model output changes as informativeness for deep detection networks, directly optimizing the sampling strategy using mAP.

Abstract

Active learning strategies aim to train high-performance models with minimal labeled data by selecting the most informative instances for labeling. However, existing methods for assessing data informativeness often fail to align directly with task model performance metrics, such as mean average precision (mAP) in object detection. This paper introduces Mean-AP Guided Reinforced Active Learning for Object Detection (MGRAL), a novel approach that leverages the concept of expected model output changes as informativeness for deep detection networks, directly optimizing the sampling strategy using mAP. MGRAL employs a reinforcement learning agent based on LSTM architecture to efficiently navigate the combinatorial challenge of batch sample selection and the non-differentiable nature between performance and selected batches. The agent optimizes selection using policy gradient with mAP improvement as the reward signal. To address the computational intensity of mAP estimation with unlabeled samples, we implement fast look-up tables, ensuring real-world feasibility. We evaluate MGRAL on PASCAL VOC and MS COCO benchmarks across various backbone architectures. Our approach demonstrates strong performance, establishing a new paradigm in reinforcement learning-based active learning for object detection.
Paper Structure (19 sections, 3 equations, 4 figures, 3 tables)

This paper contains 19 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Training and selection pipelines of Mean-AP Guided Reinforced Active Learning (MGRAL) for object detection. (a) Data sampling agent training phase: The RL-based agent learns to select informative samples using $\Delta$mAP as reward, where performance gains are efficiently estimated through a semi-supervised detector. (b) Active selection phase: The trained agent directly processes the unlabeled pool to select samples for oracle annotation, which are then added to the labeled pool for detector improvement.
  • Figure 2: Data sampling agent architecture. The agent processes unlabeled images sequentially through three main components: (1) an Image Feature Encoder that extracts visual representations from input images, (2) a series of parameter-shared LSTM modules that process image embeddings while maintaining temporal dependencies through hidden states ($h_i$) and cell states ($c_i$), and (3) individual decoder networks that output selection scores for each image. A dummy head is used to initialize the first LSTM state. Each image embedding combines the feature representation with the previous decision, enabling considering both visual content and selection history.
  • Figure 3: Comparative performance of active learning methods.
  • Figure 4: Visualizations of most representative selected samples of different methods during first cycle on Pascal VOC.