A Competitive Algorithm for Agnostic Active Learning
Eric Price, Yihan Zhou
TL;DR
The paper addresses the problem of agnostic binary active learning and seeks an algorithm whose query complexity matches the optimum up to a logarithmic factor in the hypothesis class size. It introduces a Bayesian, multiplicative-weights, splitting-based algorithm that achieves a competitive bound: the required number of queries $m$ is at most about $\big(m^*(H,\mathcal{D}_X, c_2\eta, c_3\varepsilon, 99/100) + \log(1/\delta)\big) \cdot \log\big( N(H,\mathcal{D}_X,\eta)/\delta\big)$, with $N$ the $\eta$-covering number, and runs in polynomial time. The paper also proves an NP-hardness lower bound showing that, in general, one cannot avoid the $\log|H|$ overhead, even in the realizable case, while offering improved bounds for specific problem families such as 1D threshold functions. The results thus provide a near-optimal, noise-tolerant Active Agnostic Learning algorithm with broad applicability and insightful connections to decision-tree–style splitting and minimax considerations, reinforcing the trade-offs between adaptivity, noise robustness, and coverage-based query strategies. The approach advances practical and theoretical understanding of label-efficient learning under agnostic noise, with implications for uncertainty-driven querying and structured hypothesis spaces.
Abstract
For some hypothesis classes and input distributions, active agnostic learning needs exponentially fewer samples than passive learning; for other classes and distributions, it offers little to no improvement. The most popular algorithms for agnostic active learning express their performance in terms of a parameter called the disagreement coefficient, but it is known that these algorithms are inefficient on some inputs. We take a different approach to agnostic active learning, getting an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. In particular, if any algorithm can use $m^*$ queries to get $O(η)$ error, then our algorithm uses $O(m^* \log |H|)$ queries to get $O(η)$ error. Our algorithm lies in the vein of the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable ($η= 0$) setting. We also show that it is NP-hard to do better than our algorithm's $O(\log |H|)$ overhead in general.
