Table of Contents
Fetching ...

Uncertainty Herding: One Active Learning Method for All Label Budgets

Wonho Bae, Gabriel L. Oliveira, Danica J. Sutherland

TL;DR

This work tackles the problem of selecting informative labels across varying annotation budgets in active learning. It introduces Uncertainty Coverage as a unifying objective that blends model uncertainty with distributional coverage, and develops Uncertainty Herding (UHerding) as a fast greedy optimizer with near-optimal coverage guarantees. To function across low and high budgets, the method employs adaptive parameters: temperature scaling for calibration in the low-budget regime and a decreasing kernel radius for high-budget settings, enabling seamless interpolation between representation- and uncertainty-based selection. Empirically, UHerding achieves state-of-the-art performance on a broad suite of tasks, including CIFAR-10/100, Tiny ImageNet, DomainNet, and ImageNet, as well as transfer learning with pre-trained models, demonstrating robustness and practical impact for active learning in diverse domains.

Abstract

Most active learning research has focused on methods which perform well when many labels are available, but can be dramatically worse than random selection when label budgets are small. Other methods have focused on the low-budget regime, but do poorly as label budgets increase. As the line between "low" and "high" budgets varies by problem, this is a serious issue in practice. We propose uncertainty coverage, an objective which generalizes a variety of low- and high-budget objectives, as well as natural, hyperparameter-light methods to smoothly interpolate between low- and high-budget regimes. We call greedy optimization of the estimate Uncertainty Herding; this simple method is computationally fast, and we prove that it nearly optimizes the distribution-level coverage. In experimental validation across a variety of active learning tasks, our proposal matches or beats state-of-the-art performance in essentially all cases; it is the only method of which we are aware that reliably works well in both low- and high-budget settings.

Uncertainty Herding: One Active Learning Method for All Label Budgets

TL;DR

This work tackles the problem of selecting informative labels across varying annotation budgets in active learning. It introduces Uncertainty Coverage as a unifying objective that blends model uncertainty with distributional coverage, and develops Uncertainty Herding (UHerding) as a fast greedy optimizer with near-optimal coverage guarantees. To function across low and high budgets, the method employs adaptive parameters: temperature scaling for calibration in the low-budget regime and a decreasing kernel radius for high-budget settings, enabling seamless interpolation between representation- and uncertainty-based selection. Empirically, UHerding achieves state-of-the-art performance on a broad suite of tasks, including CIFAR-10/100, Tiny ImageNet, DomainNet, and ImageNet, as well as transfer learning with pre-trained models, demonstrating robustness and practical impact for active learning in diverse domains.

Abstract

Most active learning research has focused on methods which perform well when many labels are available, but can be dramatically worse than random selection when label budgets are small. Other methods have focused on the low-budget regime, but do poorly as label budgets increase. As the line between "low" and "high" budgets varies by problem, this is a serious issue in practice. We propose uncertainty coverage, an objective which generalizes a variety of low- and high-budget objectives, as well as natural, hyperparameter-light methods to smoothly interpolate between low- and high-budget regimes. We call greedy optimization of the estimate Uncertainty Herding; this simple method is computationally fast, and we prove that it nearly optimizes the distribution-level coverage. In experimental validation across a variety of active learning tasks, our proposal matches or beats state-of-the-art performance in essentially all cases; it is the only method of which we are aware that reliably works well in both low- and high-budget settings.
Paper Structure (29 sections, 15 theorems, 25 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 29 sections, 15 theorems, 25 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $U(\mathbf{x}; f) \in [0, U_{\max}]$, $k_\sigma(\mathbf{x}, \mathbf{x}'; g) = \tilde{k}_\sigma(g(\mathbf{x}), g(\mathbf{x}')) \in [0, 1]$, $\{ g(\mathbf{x}) : \mathbf{x} \in \mathcal{U} \} \subseteq \{ \mathbf{t} \in \mathbb{R}^d : \left\lVert\mathbf{t}\right\rVert \le R \}$, and $\left\lvert\ti

Figures (8)

  • Figure 1: Left: illustration of coverages (\ref{['subsec:ucoverage']}). Right: parameter adaptation (\ref{['subsec:adapt']}).
  • Figure 2: Comparison of Margin, MaxHerding and proposed UHerding on half-moon toy data.
  • Figure 3: UHerding versus MaxHerding and uncertainty, with different uncertainty measures. Mean and standard deviation of 5 runs of the difference between a method and Random selection.
  • Figure 4: Comparison on CIFAR100 and TinyImageNet for supervised-learning tasks.
  • Figure 5: Comparison on CIFAR100 and DomainNet for transfer learning tasks.
  • ...and 3 more figures

Theorems & Definitions (25)

  • Definition 1
  • Theorem 1
  • Proposition 1
  • Proposition 1
  • Definition 2: Uncertainty Herding
  • Corollary 1
  • Proposition 1: Weighted $k$-means of wkmeans2019zhdanov
  • Proposition 1: ALFA-Mix of alfamix2022parvaneh
  • Proposition 1: BADGE, badge2019ash
  • Theorem 2
  • ...and 15 more