A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise
Ilias Diakonikolas, Nikos Zarifis
TL;DR
This work tackles PAC learning of $γ$-margin halfspaces under $η$-Massart noise, showing a computationally efficient learner with near-optimal sample complexity. The authors introduce a sequence of convex surrogate losses and an online SGD scheme with clipping, achieving $\mathrm{err}_D(\hat{w})\le η+ε$ using $n=\tilde{O}\left(1/(ε^2 γ^2)\right)$ samples and running in $\tilde{O}(d n/ε)$ time. This nearly matches the information-theoretic lower bounds for the computational model and improves upon previous efficient algorithms that required $\tilde{O}\left(1/(γ^4 ε^3)\right)$ samples. The approach is simple, practical, and provides insight into information-computation tradeoffs in Massart-noise settings, with potential extensions to general halfspaces and dimension-efficient implementations.
Abstract
We study the problem of PAC learning $γ$-margin halfspaces in the presence of Massart noise. Without computational considerations, the sample complexity of this learning problem is known to be $\widetildeΘ(1/(γ^2 ε))$. Prior computationally efficient algorithms for the problem incur sample complexity $\tilde{O}(1/(γ^4 ε^3))$ and achieve 0-1 error of $η+ε$, where $η<1/2$ is the upper bound on the noise rate. Recent work gave evidence of an information-computation tradeoff, suggesting that a quadratic dependence on $1/ε$ is required for computationally efficient algorithms. Our main result is a computationally efficient learner with sample complexity $\widetildeΘ(1/(γ^2 ε^2))$, nearly matching this lower bound. In addition, our algorithm is simple and practical, relying on online SGD on a carefully selected sequence of convex losses.
