Active Learning with a Noisy Annotator
Netta Shafir, Guy Hacohen, Daphna Weinshall
TL;DR
This work tackles active learning under noisy annotators by introducing Noise-Aware Active Sampling (NAS), a framework that augments greedy, coverage-based query strategies with a low-budget noise-filtering module. NAS partitions labeled data into clean and noisy sets, preserves clean examples for query selection, and resamples from regions previously misrepresented by noisy labels, typically using a batch size $b = C$ (the number of classes). Through ProbCover-based NPC and extensions like Weighted NPC and noise dropout, NAS demonstrates robust performance gains across symmetric, asymmetric, and real-world noise on CIFAR100, ImageNet-50, CIFAR100N, and Clothing1M, using self-supervised representations (e.g., SimCLR, DINOv2). The findings suggest practical improvements for annotation efficiency in noisy environments and point to future work on multi-annotator scenarios and adaptive strategies for even more robust active learning in the low-budget regime.
Abstract
Active Learning (AL) aims to reduce annotation costs by strategically selecting the most informative samples for labeling. However, most active learning methods struggle in the low-budget regime where only a few labeled examples are available. This issue becomes even more pronounced when annotators provide noisy labels. A common AL approach for the low- and mid-budget regimes focuses on maximizing the coverage of the labeled set across the entire dataset. We propose a novel framework called Noise-Aware Active Sampling (NAS) that extends existing greedy, coverage-based active learning strategies to handle noisy annotations. NAS identifies regions that remain uncovered due to the selection of noisy representatives and enables resampling from these areas. We introduce a simple yet effective noise filtering approach suitable for the low-budget regime, which leverages the inner mechanism of NAS and can be applied for noise filtering before model training. On multiple computer vision benchmarks, including CIFAR100 and ImageNet subsets, NAS significantly improves performance for standard active learning methods across different noise types and rates.
