Table of Contents
Fetching ...

Deep Active Learning: A Reality Check

Edrina Gashi, Jiankang Deng, Ismail Elezi

TL;DR

This paper conducts a thorough, fair empirical evaluation of state-of-the-art deep active learning methods under uniform settings. It shows that entropy-based sampling typically matches or outperforms recent deep AL methods, and that some methods underperform random sampling in general settings. It reveals how factors such as the starting budget, per-cycle budget, and pretraining materially affect results, and extends the analysis to semi-supervised learning integration and object detection. The findings yield concrete recommendations and highlight the need for rigorous evaluation practices to guide real-world annotation budgeting.

Abstract

We conduct a comprehensive evaluation of state-of-the-art deep active learning methods. Surprisingly, under general settings, no single-model method decisively outperforms entropy-based active learning, and some even fall short of random sampling. We delve into overlooked aspects like starting budget, budget step, and pretraining's impact, revealing their significance in achieving superior results. Additionally, we extend our evaluation to other tasks, exploring the active learning effectiveness in combination with semi-supervised learning, and object detection. Our experiments provide valuable insights and concrete recommendations for future active learning studies. By uncovering the limitations of current methods and understanding the impact of different experimental settings, we aim to inspire more efficient training of deep learning models in real-world scenarios with limited annotation budgets. This work contributes to advancing active learning's efficacy in deep learning and empowers researchers to make informed decisions when applying active learning to their tasks.

Deep Active Learning: A Reality Check

TL;DR

This paper conducts a thorough, fair empirical evaluation of state-of-the-art deep active learning methods under uniform settings. It shows that entropy-based sampling typically matches or outperforms recent deep AL methods, and that some methods underperform random sampling in general settings. It reveals how factors such as the starting budget, per-cycle budget, and pretraining materially affect results, and extends the analysis to semi-supervised learning integration and object detection. The findings yield concrete recommendations and highlight the need for rigorous evaluation practices to guide real-world annotation budgeting.

Abstract

We conduct a comprehensive evaluation of state-of-the-art deep active learning methods. Surprisingly, under general settings, no single-model method decisively outperforms entropy-based active learning, and some even fall short of random sampling. We delve into overlooked aspects like starting budget, budget step, and pretraining's impact, revealing their significance in achieving superior results. Additionally, we extend our evaluation to other tasks, exploring the active learning effectiveness in combination with semi-supervised learning, and object detection. Our experiments provide valuable insights and concrete recommendations for future active learning studies. By uncovering the limitations of current methods and understanding the impact of different experimental settings, we aim to inspire more efficient training of deep learning models in real-world scenarios with limited annotation budgets. This work contributes to advancing active learning's efficacy in deep learning and empowers researchers to make informed decisions when applying active learning to their tasks.
Paper Structure (18 sections, 4 equations, 4 figures, 13 tables)

This paper contains 18 sections, 4 equations, 4 figures, 13 tables.

Figures (4)

  • Figure 1: Main results: Comparison of different active learning algorithms in a) Cifar-10, b) Cifar-100, c) Caltech-101, d) Caltech-256 datasets. We observe that the entropy (black) reaches the highest results in all datasets, with the random baseline (dotted red) typically reaching the worst results. Best viewed in high resolution when zoomed in.
  • Figure 2: a) The effect of labeling budget for active learning cycle. We observe that in general, the labeling budget should neither be too large (bias towards easy samples) nor too small (bias towards hard samples). b) The effect of training from scratch in each cycle compared to finetuning the network trained in the previous cycle. We observe that training from scratch reaches higher results, especially in the early AL cycles.
  • Figure 3: a) Ablation study on the effect of diversity. We clearly observe that accounting for diversity matters, even if the solution is a simple heuristic. b) Ablation on diversity where each sample occurs twice in the dataset. c) Ablation on diversity where each sample occurs five times in the dataset.
  • Figure 4: a) Results in object detection. We use the PASCAL VOC07+12 dataset. b) Results of Active Learning object detection in combination with consistency-based Semi-Supervised Learning. We use the PASCAL VOC07+12 dataset. c) Results of Active Learning classification in combination with consistency-based Semi-Supervised Learning. We use the CIFAR-10 dataset.