Generalized Coverage for More Robust Low-Budget Active Learning

Wonho Bae; Junhyug Noh; Danica J. Sutherland

Generalized Coverage for More Robust Low-Budget Active Learning

Wonho Bae, Junhyug Noh, Danica J. Sutherland

TL;DR

The paper tackles active learning in ultra-low-budget scenarios where labeling is scarce and where traditional uncertainty-based methods falter. It introduces generalized coverage, a flexible objective $C_k(L)=\mathbb{E}_{x}[\max_{x' \in L} k(x, x')]$, and proposes MaxHerding, a greedy algorithm that maximizes this objective and is motivated by kernel herding. The work demonstrates that ProbCover is a special case of MaxHerding (with a top-hat kernel) and provides a non-greedy kernel $k$-medoids variant to connect to existing methods; MaxHerding achieves strong performance with lower computational cost across multiple low-budget image benchmarks and is robust to kernel choice and budget size. These results offer a scalable, robust approach for learning with limited labels, particularly when leveraging self-supervised representations.

Abstract

The ProbCover method of Yehuda et al. is a well-motivated algorithm for active learning in low-budget regimes, which attempts to "cover" the data distribution with balls of a given radius at selected data points. We demonstrate, however, that the performance of this algorithm is extremely sensitive to the choice of this radius hyper-parameter, and that tuning it is quite difficult, with the original heuristic frequently failing. We thus introduce (and theoretically motivate) a generalized notion of "coverage," including ProbCover's objective as a special case, but also allowing smoother notions that are far more robust to hyper-parameter choice. We propose an efficient greedy method to optimize this coverage, generalizing ProbCover's algorithm; due to its close connection to kernel herding, we call it "MaxHerding." The objective can also be optimized non-greedily through a variant of $k$-medoids, clarifying the relationship to other low-budget active learning methods. In comprehensive experiments, MaxHerding surpasses existing active learning methods across multiple low-budget image classification benchmarks, and does so with less computational cost than most competitive methods.

Generalized Coverage for More Robust Low-Budget Active Learning

TL;DR

, and proposes MaxHerding, a greedy algorithm that maximizes this objective and is motivated by kernel herding. The work demonstrates that ProbCover is a special case of MaxHerding (with a top-hat kernel) and provides a non-greedy kernel

-medoids variant to connect to existing methods; MaxHerding achieves strong performance with lower computational cost across multiple low-budget image benchmarks and is robust to kernel choice and budget size. These results offer a scalable, robust approach for learning with limited labels, particularly when leveraging self-supervised representations.

Abstract

-medoids, clarifying the relationship to other low-budget active learning methods. In comprehensive experiments, MaxHerding surpasses existing active learning methods across multiple low-budget image classification benchmarks, and does so with less computational cost than most competitive methods.

Paper Structure (43 sections, 10 theorems, 24 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 43 sections, 10 theorems, 24 equations, 10 figures, 1 table, 1 algorithm.

Introduction
Background
Active Learning
Uncertainty-based
Representation-based
Low-Budget Active Learning
Maximum coverage
Self-supervised features
Kernel Herding
Our Method
Generalized Coverage
Extension to a linear classifier
Greedy Maximization: MaxHerding
Connection to ProbCover
Connection to kernel herding
...and 28 more sections

Key Result

proposition thmcounterproposition

Let $\mathbf{x}$ be distributed such that $k(\mathbf{x}, \mathbf{x})$ is bounded for a positive definite kernel $k$, and assume a mild regularity condition which is typically satisfied as long as $\mathbf{x}$ is not concentrated at a single point. Then, for any function $g(\mathbf{x})$ in the RKHS $

Figures (10)

Figure 1: Illustration of the connection between our proposed methods (in red) and the existing active learning (in blue) as well as herding methods (in green). We describe (a)--(c) in \ref{['sec:method']}. For (d)--(f), please refer to steinpoints2018chen, probcover2022yehuda, and typiclust2022hacohen, respectively.
Figure 2: (a) Next choices for different algorithms after selecting the initial orange point. (b) Varying the radius $\delta$ in ProbCover, and Gaussian lengthscale for MaxHerding, for CIFAR10 classification. See text for the definition of purity.
Figure 3: Comparison on benchmark datasets using 1-NN classifier.
Figure 4: Comparison on imbalanced datasets using 1-NN classifier.
Figure 5: (a) compares the greedy (MaxHerding) and non-greedy (kernel $k$-medoids) algorithms on CIFAR100, and (b) compares low budget active learning methods in terms of runtime in seconds per selection on CIFAR100 and TinyImageNet.
...and 5 more figures

Theorems & Definitions (16)

proposition thmcounterproposition: Kernel herding convergence kherding2010chen
theorem thmcountertheorem
corollary thmcountercorollary
definition thmcounterdefinition: Generalized Coverage
proposition thmcounterproposition
proof
corollary thmcountercorollary
proposition thmcounterproposition
proposition thmcounterproposition
proof
...and 6 more

Generalized Coverage for More Robust Low-Budget Active Learning

TL;DR

Abstract

Generalized Coverage for More Robust Low-Budget Active Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (16)