A Competitive Algorithm for Agnostic Active Learning

Eric Price; Yihan Zhou

A Competitive Algorithm for Agnostic Active Learning

Eric Price, Yihan Zhou

TL;DR

The paper addresses the problem of agnostic binary active learning and seeks an algorithm whose query complexity matches the optimum up to a logarithmic factor in the hypothesis class size. It introduces a Bayesian, multiplicative-weights, splitting-based algorithm that achieves a competitive bound: the required number of queries $m$ is at most about $\big(m^*(H,\mathcal{D}_X, c_2\eta, c_3\varepsilon, 99/100) + \log(1/\delta)\big) \cdot \log\big( N(H,\mathcal{D}_X,\eta)/\delta\big)$, with $N$ the $\eta$-covering number, and runs in polynomial time. The paper also proves an NP-hardness lower bound showing that, in general, one cannot avoid the $\log|H|$ overhead, even in the realizable case, while offering improved bounds for specific problem families such as 1D threshold functions. The results thus provide a near-optimal, noise-tolerant Active Agnostic Learning algorithm with broad applicability and insightful connections to decision-tree–style splitting and minimax considerations, reinforcing the trade-offs between adaptivity, noise robustness, and coverage-based query strategies. The approach advances practical and theoretical understanding of label-efficient learning under agnostic noise, with implications for uncertainty-driven querying and structured hypothesis spaces.

Abstract

For some hypothesis classes and input distributions, active agnostic learning needs exponentially fewer samples than passive learning; for other classes and distributions, it offers little to no improvement. The most popular algorithms for agnostic active learning express their performance in terms of a parameter called the disagreement coefficient, but it is known that these algorithms are inefficient on some inputs. We take a different approach to agnostic active learning, getting an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. In particular, if any algorithm can use $m^*$ queries to get $O(η)$ error, then our algorithm uses $O(m^* \log |H|)$ queries to get $O(η)$ error. Our algorithm lies in the vein of the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable ($η= 0$) setting. We also show that it is NP-hard to do better than our algorithm's $O(\log |H|)$ overhead in general.

A Competitive Algorithm for Agnostic Active Learning

TL;DR

is at most about

, with

the

-covering number, and runs in polynomial time. The paper also proves an NP-hardness lower bound showing that, in general, one cannot avoid the

overhead, even in the realizable case, while offering improved bounds for specific problem families such as 1D threshold functions. The results thus provide a near-optimal, noise-tolerant Active Agnostic Learning algorithm with broad applicability and insightful connections to decision-tree–style splitting and minimax considerations, reinforcing the trade-offs between adaptivity, noise robustness, and coverage-based query strategies. The approach advances practical and theoretical understanding of label-efficient learning under agnostic noise, with implications for uncertainty-driven querying and structured hypothesis spaces.

Abstract

and distribution

over

. In particular, if any algorithm can use

queries to get

error, then our algorithm uses

queries to get

error. Our algorithm lies in the vein of the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable (

) setting. We also show that it is NP-hard to do better than our algorithm's

overhead in general.

Paper Structure (24 sections, 21 theorems, 81 equations, 1 figure, 1 algorithm)

This paper contains 24 sections, 21 theorems, 81 equations, 1 figure, 1 algorithm.

Introduction
Our Results.
Extension.
Related Work
Minimax sample complexity bounds.
Future Work.
Algorithm Overview
Realizable setting.
Handling noise: initial attempt.
Handling noise: the challenge.
Remark 1:
Remark 2:
Generalization for Better Bounds.
Proof of Lemma \ref{['lem:rm']}
Handling $\eta > 0$.
...and 9 more sections

Key Result

Theorem 1.1

There exist some constants $c_1, c_2$ and $c_3$ such that for any instance $(H, \mathcal{D}_X, \eta, \varepsilon, \delta)$ with $\varepsilon \ge c_1\eta$, Algorithm Alg:SAAAL solves the instance with sample complexity and polynomial time.

Figures (1)

Figure 1: An example demonstrating that the weight of the true hypothesis can decrease if $\lambda$ is concentrated on the wrong ball. In this example, the true labels $y$ are closest to $h_3$. But if the prior $\lambda$ on hypotheses puts far more weight on $h_1$ and $h_2$, the algorithm will query uniformly over where $h_1$ and $h_2$ disagree: the second half of points. Over this query distribution, $h_1$ is more correct than $h_3$, so the weight of $h_3$ can actually decrease if $\lambda(h_1)$ is very large.

Theorems & Definitions (40)

Theorem 1.1: Competitive Bound
Theorem 1.2: Lower Bound
Example 1.3
Lemma 2.0: Connection to OPT
Theorem 2.1
proof
Theorem 2.2
Corollary 2.3
proof
Lemma 3.0: Connection to OPT
...and 30 more

A Competitive Algorithm for Agnostic Active Learning

TL;DR

Abstract

A Competitive Algorithm for Agnostic Active Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (40)