Revisiting Agnostic Boosting
Arthur da Cunha, Mikael Møller Høgsgaard, Andrea Paudice, Yuxin Sun
TL;DR
The paper addresses agnostic boosting for binary classification by proposing a three-stage approach that combines a reduction to realizable boosting, a margin-based pruning of hypotheses, and a validation-based extraction of a final classifier. It proves a near-optimal sample complexity bound that interpolates between agnostic and realizable settings and provides a matching lower bound up to logarithmic factors. The key ideas include relabeling data across a reference class, a margin-based filtering step to shrink an exponentially large hypothesis set to a logarithmic size, and a final selection via a third data split with rigorous generalization guarantees. Overall, the work advances theoretical understanding of agnostic boosting and points toward future work on computationally efficient implementations and tighter logarithmic-factor removal.
Abstract
Boosting is a key method in statistical learning, allowing for converting weak learners into strong ones. While well studied in the realizable case, the statistical properties of weak-to-strong learning remain less understood in the agnostic setting, where there are no assumptions on the distribution of the labels. In this work, we propose a new agnostic boosting algorithm with substantially improved sample complexity compared to prior works under very general assumptions. Our approach is based on a reduction to the realizable case, followed by a margin-based filtering of high-quality hypotheses. Furthermore, we show a nearly-matching lower bound, settling the sample complexity of agnostic boosting up to logarithmic factors.
