Table of Contents
Fetching ...

A Survey on Deep Active Learning: Recent Advances and New Frontiers

Dongyuan Li, Zhen Wang, Yankai Chen, Renhe Jiang, Weiping Ding, Manabu Okumura

TL;DR

This survey addresses the DAL problem by formalizing a pool-based acquisition loop that leverages deep models and pre-trained representations to reduce labeling costs. It provides a comprehensive taxonomy across annotation types, query strategies, architectures, learning paradigms, and training processes, and catalogs influential baselines and datasets. The authors review DAL applications in NLP, CV, and graph data mining, and discuss pipeline-, task-, and dataset-related challenges with practical strategies such as pseudo-labeling, transfer learning, and PLM integration. The work aims to serve as a practical guide for researchers and practitioners, highlighting open problems and offering a GitHub resource for up-to-date DAL techniques.

Abstract

Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and comprehensive survey on DAL. We first introduce reviewed paper collection and filtering. Second, we formally define the DAL task and summarize the most influential baselines and widely used datasets. Third, we systematically provide a taxonomy of DAL methods from five perspectives, including annotation types, query strategies, deep model architectures, learning paradigms, and training processes, and objectively analyze their strengths and weaknesses. Then, we comprehensively summarize main applications of DAL in Natural Language Processing (NLP), Computer Vision (CV), and Data Mining (DM), etc. Finally, we discuss challenges and perspectives after a detailed analysis of current studies. This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in DAL. We hope that this survey will spur further progress in this burgeoning field.

A Survey on Deep Active Learning: Recent Advances and New Frontiers

TL;DR

This survey addresses the DAL problem by formalizing a pool-based acquisition loop that leverages deep models and pre-trained representations to reduce labeling costs. It provides a comprehensive taxonomy across annotation types, query strategies, architectures, learning paradigms, and training processes, and catalogs influential baselines and datasets. The authors review DAL applications in NLP, CV, and graph data mining, and discuss pipeline-, task-, and dataset-related challenges with practical strategies such as pseudo-labeling, transfer learning, and PLM integration. The work aims to serve as a practical guide for researchers and practitioners, highlighting open problems and offering a GitHub resource for up-to-date DAL techniques.

Abstract

Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and comprehensive survey on DAL. We first introduce reviewed paper collection and filtering. Second, we formally define the DAL task and summarize the most influential baselines and widely used datasets. Third, we systematically provide a taxonomy of DAL methods from five perspectives, including annotation types, query strategies, deep model architectures, learning paradigms, and training processes, and objectively analyze their strengths and weaknesses. Then, we comprehensively summarize main applications of DAL in Natural Language Processing (NLP), Computer Vision (CV), and Data Mining (DM), etc. Finally, we discuss challenges and perspectives after a detailed analysis of current studies. This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in DAL. We hope that this survey will spur further progress in this burgeoning field.
Paper Structure (22 sections, 5 equations, 17 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 5 equations, 17 figures, 4 tables, 1 algorithm.

Figures (17)

  • Figure 1: The general pipeline in deep active learning.
  • Figure 2: Taxonomy for deep active learning methods.
  • Figure 3: Emerging challenges in deep active learning.
  • Figure 4: Keywords and publication trend on DAL.
  • Figure 5: An example for contrastive learning based query strategies.
  • ...and 12 more figures