Table of Contents
Fetching ...

A Competitive Algorithm for Agnostic Active Learning

Eric Price, Yihan Zhou

TL;DR

The paper addresses the problem of agnostic binary active learning and seeks an algorithm whose query complexity matches the optimum up to a logarithmic factor in the hypothesis class size. It introduces a Bayesian, multiplicative-weights, splitting-based algorithm that achieves a competitive bound: the required number of queries $m$ is at most about $\big(m^*(H,\mathcal{D}_X, c_2\eta, c_3\varepsilon, 99/100) + \log(1/\delta)\big) \cdot \log\big( N(H,\mathcal{D}_X,\eta)/\delta\big)$, with $N$ the $\eta$-covering number, and runs in polynomial time. The paper also proves an NP-hardness lower bound showing that, in general, one cannot avoid the $\log|H|$ overhead, even in the realizable case, while offering improved bounds for specific problem families such as 1D threshold functions. The results thus provide a near-optimal, noise-tolerant Active Agnostic Learning algorithm with broad applicability and insightful connections to decision-tree–style splitting and minimax considerations, reinforcing the trade-offs between adaptivity, noise robustness, and coverage-based query strategies. The approach advances practical and theoretical understanding of label-efficient learning under agnostic noise, with implications for uncertainty-driven querying and structured hypothesis spaces.

Abstract

For some hypothesis classes and input distributions, active agnostic learning needs exponentially fewer samples than passive learning; for other classes and distributions, it offers little to no improvement. The most popular algorithms for agnostic active learning express their performance in terms of a parameter called the disagreement coefficient, but it is known that these algorithms are inefficient on some inputs. We take a different approach to agnostic active learning, getting an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. In particular, if any algorithm can use $m^*$ queries to get $O(η)$ error, then our algorithm uses $O(m^* \log |H|)$ queries to get $O(η)$ error. Our algorithm lies in the vein of the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable ($η= 0$) setting. We also show that it is NP-hard to do better than our algorithm's $O(\log |H|)$ overhead in general.

A Competitive Algorithm for Agnostic Active Learning

TL;DR

The paper addresses the problem of agnostic binary active learning and seeks an algorithm whose query complexity matches the optimum up to a logarithmic factor in the hypothesis class size. It introduces a Bayesian, multiplicative-weights, splitting-based algorithm that achieves a competitive bound: the required number of queries is at most about , with the -covering number, and runs in polynomial time. The paper also proves an NP-hardness lower bound showing that, in general, one cannot avoid the overhead, even in the realizable case, while offering improved bounds for specific problem families such as 1D threshold functions. The results thus provide a near-optimal, noise-tolerant Active Agnostic Learning algorithm with broad applicability and insightful connections to decision-tree–style splitting and minimax considerations, reinforcing the trade-offs between adaptivity, noise robustness, and coverage-based query strategies. The approach advances practical and theoretical understanding of label-efficient learning under agnostic noise, with implications for uncertainty-driven querying and structured hypothesis spaces.

Abstract

For some hypothesis classes and input distributions, active agnostic learning needs exponentially fewer samples than passive learning; for other classes and distributions, it offers little to no improvement. The most popular algorithms for agnostic active learning express their performance in terms of a parameter called the disagreement coefficient, but it is known that these algorithms are inefficient on some inputs. We take a different approach to agnostic active learning, getting an algorithm that is competitive with the optimal algorithm for any binary hypothesis class and distribution over . In particular, if any algorithm can use queries to get error, then our algorithm uses queries to get error. Our algorithm lies in the vein of the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable () setting. We also show that it is NP-hard to do better than our algorithm's overhead in general.
Paper Structure (24 sections, 21 theorems, 81 equations, 1 figure, 1 algorithm)

This paper contains 24 sections, 21 theorems, 81 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1.1

There exist some constants $c_1, c_2$ and $c_3$ such that for any instance $(H, \mathcal{D}_X, \eta, \varepsilon, \delta)$ with $\varepsilon \ge c_1\eta$, Algorithm Alg:SAAAL solves the instance with sample complexity and polynomial time.

Figures (1)

  • Figure 1: An example demonstrating that the weight of the true hypothesis can decrease if $\lambda$ is concentrated on the wrong ball. In this example, the true labels $y$ are closest to $h_3$. But if the prior $\lambda$ on hypotheses puts far more weight on $h_1$ and $h_2$, the algorithm will query uniformly over where $h_1$ and $h_2$ disagree: the second half of points. Over this query distribution, $h_1$ is more correct than $h_3$, so the weight of $h_3$ can actually decrease if $\lambda(h_1)$ is very large.

Theorems & Definitions (40)

  • Theorem 1.1: Competitive Bound
  • Theorem 1.2: Lower Bound
  • Example 1.3
  • Lemma 2.0: Connection to OPT
  • Theorem 2.1
  • proof
  • Theorem 2.2
  • Corollary 2.3
  • proof
  • Lemma 3.0: Connection to OPT
  • ...and 30 more