On Agnostic PAC Learning in the Small Error Regime

Julian Asilis; Mikael Møller Høgsgaard; Grigoris Velegkas

On Agnostic PAC Learning in the Small Error Regime

Julian Asilis, Mikael Møller Høgsgaard, Grigoris Velegkas

TL;DR

The paper advances the understanding of agnostic PAC learning in the small-error regime by introducing a tau-aware error framework and designing a computationally efficient learner that achieves error at most 2.1 · tau plus standard concentration terms, matching known lower bounds when tau is close to d/m. The approach leverages aggregations of ERM classifiers through careful subsampling and a voting scheme, and then refines the method with a 27-way sample split and a region-of-disagreement tie-breaker to reduce the multiplicative dependence on tau. A key contribution is showing how to integrate this tau-based learner with the prior work of Hanneke, and thereby obtain a best-of-both-worlds guarantee. The results resolve the tau ≈ d/m regime and push forward the broader program of understanding the trade-offs between majority-voting schemes and computationally efficient agnostic learners, while leaving open whether the constant can be driven to 1.

Abstract

Binary classification in the classic PAC model exhibits a curious phenomenon: Empirical Risk Minimization (ERM) learners are suboptimal in the realizable case yet optimal in the agnostic case. Roughly speaking, this owes itself to the fact that non-realizable distributions $\mathcal{D}$ are simply more difficult to learn than realizable distributions -- even when one discounts a learner's error by $\mathrm{err}(h^*_{\mathcal{D}})$, the error of the best hypothesis in $\mathcal{H}$ for $\mathcal{D}$. Thus, optimal agnostic learners are permitted to incur excess error on (easier-to-learn) distributions $\mathcal{D}$ for which $τ= \mathrm{err}(h^*_{\mathcal{D}})$ is small. Recent work of Hanneke, Larsen, and Zhivotovskiy (FOCS `24) addresses this shortcoming by including $τ$ itself as a parameter in the agnostic error term. In this more fine-grained model, they demonstrate tightness of the error lower bound $τ+ Ω\left(\sqrt{\frac{τ(d + \log(1 / δ))}{m}} + \frac{d + \log(1 / δ)}{m} \right)$ in a regime where $τ> d/m$, and leave open the question of whether there may be a higher lower bound when $τ\approx d/m$, with $d$ denoting $\mathrm{VC}(\mathcal{H})$. In this work, we resolve this question by exhibiting a learner which achieves error $c \cdot τ+ O \left(\sqrt{\frac{τ(d + \log(1 / δ))}{m}} + \frac{d + \log(1 / δ)}{m} \right)$ for a constant $c \leq 2.1$, thus matching the lower bound when $τ\approx d/m$. Further, our learner is computationally efficient and is based upon careful aggregations of ERM classifiers, making progress on two other questions of Hanneke, Larsen, and Zhivotovskiy (FOCS `24). We leave open the interesting question of whether our approach can be refined to lower the constant from 2.1 to 1, which would completely settle the complexity of agnostic learning.

On Agnostic PAC Learning in the Small Error Regime

TL;DR

Abstract

On Agnostic PAC Learning in the Small Error Regime

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (12)