Table of Contents
Fetching ...

ActiveLab: Active Learning with Re-Labeling by Multiple Annotators

Hui Wen Goh, Jonas Mueller

TL;DR

ActiveLab tackles label-noise in pool-based active learning with multiple annotators by fusing out-of-sample classifier predictions and annotator labels into a per-example acquisition score s_i. It extends CROWDLAB to actively decide between labeling new data and re-labeling existing data, estimating annotator trust via w_j, global P, and model weight w_M, and calibrates predictions with temperature scaling before scoring. The method supports single-model and ensemble configurations, and demonstrates superior performance on both tabular and image tasks with fewer total annotations, including effective active label cleaning. This approach offers a practical, model- and modality-agnostic framework for robust learning under annotation noise, with significant implications for real-world labeling budgets and data quality.

Abstract

In real-world data labeling applications, annotators often provide imperfect labels. It is thus common to employ multiple annotators to label data with some overlap between their examples. We study active learning in such settings, aiming to train an accurate classifier by collecting a dataset with the fewest total annotations. Here we propose ActiveLab, a practical method to decide what to label next that works with any classifier model and can be used in pool-based batch active learning with one or multiple annotators. ActiveLab automatically estimates when it is more informative to re-label examples vs. labeling entirely new ones. This is a key aspect of producing high quality labels and trained models within a limited annotation budget. In experiments on image and tabular data, ActiveLab reliably trains more accurate classifiers with far fewer annotations than a wide variety of popular active learning methods.

ActiveLab: Active Learning with Re-Labeling by Multiple Annotators

TL;DR

ActiveLab tackles label-noise in pool-based active learning with multiple annotators by fusing out-of-sample classifier predictions and annotator labels into a per-example acquisition score s_i. It extends CROWDLAB to actively decide between labeling new data and re-labeling existing data, estimating annotator trust via w_j, global P, and model weight w_M, and calibrates predictions with temperature scaling before scoring. The method supports single-model and ensemble configurations, and demonstrates superior performance on both tabular and image tasks with fewer total annotations, including effective active label cleaning. This approach offers a practical, model- and modality-agnostic framework for robust learning under annotation noise, with significant implications for real-world labeling budgets and data quality.

Abstract

In real-world data labeling applications, annotators often provide imperfect labels. It is thus common to employ multiple annotators to label data with some overlap between their examples. We study active learning in such settings, aiming to train an accurate classifier by collecting a dataset with the fewest total annotations. Here we propose ActiveLab, a practical method to decide what to label next that works with any classifier model and can be used in pool-based batch active learning with one or multiple annotators. ActiveLab automatically estimates when it is more informative to re-label examples vs. labeling entirely new ones. This is a key aspect of producing high quality labels and trained models within a limited annotation budget. In experiments on image and tabular data, ActiveLab reliably trains more accurate classifiers with far fewer annotations than a wide variety of popular active learning methods.
Paper Structure (21 sections, 14 equations, 7 figures, 1 algorithm)

This paper contains 21 sections, 14 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Evaluating active learning methods on the Wall Robot dataset to train an: ExtraTrees classifier (left) or ensemble of 3 models (right). Curves show test accuracy after each active learning iteration, averaged over 5 runs with the standard deviation in results shaded.
  • Figure 2: Evaluating active learning methods on CIFAR-10H to train a: ResNet-18 classifier (left) or ensemble of ResNet-18/34/50 models. Curves show the test accuracy after each iteration of active learning, averaged over 5 runs with the standard deviation in results shaded.
  • Figure 3: Evaluating active learning methods on the Wall Robot Complete dataset to train an: ExtraTrees classifier (left) or ensemble of 3 models (right). Curves show test accuracy after each iteration of re-labeling, averaged over 5 runs with the standard deviation shaded.
  • Figure 4: Comparing active learning methods that exclusively label new examples (single labels) vs. can also re-label examples instead (multiannotator labels), when annotators have different noise rates. Shown is the test accuracy of an ExtraTrees classifier trained on a certain number of total labels (corresponding to each iteration of active learning) for the Wall Robot Dataset. Curves are the average over 5 runs, and the standard deviation in results is shaded.
  • Figure S1: Evaluating active learning methods on the Wall Robot dataset to train a MLP classifier. Curves show the test accuracy after each active learning iteration, averaged over 5 runs with the standard deviation in results shaded.
  • ...and 2 more figures