LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

Jifan Zhang; Yifang Chen; Gregory Canal; Stephen Mussmann; Arnav M. Das; Gantavya Bhatt; Yinglun Zhu; Jeffrey Bilmes; Simon Shaolei Du; Kevin Jamieson; Robert D Nowak

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

Jifan Zhang, Yifang Chen, Gregory Canal, Stephen Mussmann, Arnav M. Das, Gantavya Bhatt, Yinglun Zhu, Jeffrey Bilmes, Simon Shaolei Du, Kevin Jamieson, Robert D Nowak

TL;DR

LabelBench addresses the challenge of benchmarking label-efficient learning in a unified, scalable framework that jointly evaluates transfer learning, semi-supervised learning, and active learning on large pretrained models. It introduces selection-via-proxy to dramatically reduce compute during iterative data labeling, while still reporting final performance via end-to-end fine-tuning. Through extensive experiments with CLIP/CoCa ViT-B32 across CIFAR-10/100 and ImageNet, LabelBench demonstrates substantial label savings when combining AL with Semi-SL and large pretrained models, outperforming prior results in several settings. The framework is modular and open-source, enabling researchers to plug in new datasets, AL strategies, and Semi-SL methods to advance the field of label-efficient learning.

Abstract

Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires combinations of these techniques, existing benchmark and evaluation frameworks do not capture a concerted combination of all such techniques. This paper addresses this deficiency by introducing LabelBench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of LabelBench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates better label-efficiencies than previously reported in active learning. LabelBench's modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: https://github.com/EfficientTraining/LabelBench.

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

TL;DR

Abstract

Paper Structure (27 sections, 7 equations, 22 figures, 13 tables)

This paper contains 27 sections, 7 equations, 22 figures, 13 tables.

Introduction
Related Work
Semi-supervised Training
Active Learning
Label Efficient Fine-tuning Framework
Selection via Proxy
Codebase
Benchmarking Active Learning Algorithms
Experiment Setup
Performance Metrics
Datasets
Results and Discussion
Call for Contribution and Future Work
Conclusion
Definition of Metrics
...and 12 more sections

Figures (22)

Figure 1: Performance of active learning + FlexMatch (semi-supervised) retraining + CLIP ViT-B32 when given different annotation budgets. Generalization accuracy refers to the model's Top-1 test accuracy. Pool accuracy measures the labeling accuracy on the pool of examples to be labeled (see Section \ref{['sec:metrics']} for more details). Each curve of CIFAR-10 is averaged over 4 trials and each curve of ImageNet is averaged over two trials. The confidence intervals are based on standard error. The AL gains over passive presented here are significantly larger than typical gains observed in previous AL work where Semi-SL and pretrained models are not considered.
Figure 2: A modular framework consisting of pretrained models, Semi-SL trainer and AL strategies.
Figure 3: Our modular codebase allows one to work solely in one directory without a thorough knowledge of the entire codebase. Implementing a new dataset or semi-supervised learning trainer is as easy as implementing a single function.
Figure 4: Performances of different data selection strategies on ImageNet, fMoW and CIFAR-100. We omit GALAXY in ImageNet due to its expensive computational complexity on large datasets. The ImageNet results differ from Figure \ref{['fig:highlight']} since we use a different pretrained model, CoCa ViT-B32. Each result of fMoW and CIFAR-100 is averaged over four trials and each result of ImageNet is over two trials due to limited computing resources. The confidence intervals are based on standard error.
Figure 5: (a) and (b): Generalization performance on CIFAR-10 when using different proxy models for data selection. (c): Generalization performance when using supervised trainer instead of Semi-SL (we use selection with end-to-end fine-tuning here). Each result is averaged over four trials with standard error shown as confidence interval.
...and 17 more figures

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

TL;DR

Abstract

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (22)