LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning
Jifan Zhang, Yifang Chen, Gregory Canal, Stephen Mussmann, Arnav M. Das, Gantavya Bhatt, Yinglun Zhu, Jeffrey Bilmes, Simon Shaolei Du, Kevin Jamieson, Robert D Nowak
TL;DR
LabelBench addresses the challenge of benchmarking label-efficient learning in a unified, scalable framework that jointly evaluates transfer learning, semi-supervised learning, and active learning on large pretrained models. It introduces selection-via-proxy to dramatically reduce compute during iterative data labeling, while still reporting final performance via end-to-end fine-tuning. Through extensive experiments with CLIP/CoCa ViT-B32 across CIFAR-10/100 and ImageNet, LabelBench demonstrates substantial label savings when combining AL with Semi-SL and large pretrained models, outperforming prior results in several settings. The framework is modular and open-source, enabling researchers to plug in new datasets, AL strategies, and Semi-SL methods to advance the field of label-efficient learning.
Abstract
Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires combinations of these techniques, existing benchmark and evaluation frameworks do not capture a concerted combination of all such techniques. This paper addresses this deficiency by introducing LabelBench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of LabelBench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates better label-efficiencies than previously reported in active learning. LabelBench's modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: https://github.com/EfficientTraining/LabelBench.
