Compute-Efficient Active Learning
Gábor Németh, Tamás Matuszka
TL;DR
The paper tackles the computational bottleneck of active learning on large unlabeled datasets. It proposes a compute-efficient, method-agnostic framework that uses historical acquisition function values to selectively evaluate a candidate pool of size $\alpha \cdot N$, iterating for $T$ retraining steps. Empirical results on MNIST and CIFAR-10 show gains over random sampling and often surpass uncertainty-based baselines while reducing compute by up to about $25\%$, with additional validation on multimodal 3D object detection illustrating broader applicability. The approach is adaptable to regression and other domains, and the code is publicly available.
Abstract
Active learning, a powerful paradigm in machine learning, aims at reducing labeling costs by selecting the most informative samples from an unlabeled dataset. However, the traditional active learning process often demands extensive computational resources, hindering scalability and efficiency. In this paper, we address this critical issue by presenting a novel method designed to alleviate the computational burden associated with active learning on massive datasets. To achieve this goal, we introduce a simple, yet effective method-agnostic framework that outlines how to strategically choose and annotate data points, optimizing the process for efficiency while maintaining model performance. Through case studies, we demonstrate the effectiveness of our proposed method in reducing computational costs while maintaining or, in some cases, even surpassing baseline model outcomes. Code is available at https://github.com/aimotive/Compute-Efficient-Active-Learning.
