A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise

Ilias Diakonikolas; Nikos Zarifis

A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise

Ilias Diakonikolas, Nikos Zarifis

TL;DR

This work tackles PAC learning of $γ$-margin halfspaces under $η$-Massart noise, showing a computationally efficient learner with near-optimal sample complexity. The authors introduce a sequence of convex surrogate losses and an online SGD scheme with clipping, achieving $\mathrm{err}_D(\hat{w})\le η+ε$ using $n=\tilde{O}\left(1/(ε^2 γ^2)\right)$ samples and running in $\tilde{O}(d n/ε)$ time. This nearly matches the information-theoretic lower bounds for the computational model and improves upon previous efficient algorithms that required $\tilde{O}\left(1/(γ^4 ε^3)\right)$ samples. The approach is simple, practical, and provides insight into information-computation tradeoffs in Massart-noise settings, with potential extensions to general halfspaces and dimension-efficient implementations.

Abstract

We study the problem of PAC learning $γ$-margin halfspaces in the presence of Massart noise. Without computational considerations, the sample complexity of this learning problem is known to be $\widetildeΘ(1/(γ^2 ε))$. Prior computationally efficient algorithms for the problem incur sample complexity $\tilde{O}(1/(γ^4 ε^3))$ and achieve 0-1 error of $η+ε$, where $η<1/2$ is the upper bound on the noise rate. Recent work gave evidence of an information-computation tradeoff, suggesting that a quadratic dependence on $1/ε$ is required for computationally efficient algorithms. Our main result is a computationally efficient learner with sample complexity $\widetildeΘ(1/(γ^2 ε^2))$, nearly matching this lower bound. In addition, our algorithm is simple and practical, relying on online SGD on a carefully selected sequence of convex losses.

A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise

TL;DR

This work tackles PAC learning of

-margin halfspaces under

-Massart noise, showing a computationally efficient learner with near-optimal sample complexity. The authors introduce a sequence of convex surrogate losses and an online SGD scheme with clipping, achieving

using

samples and running in

time. This nearly matches the information-theoretic lower bounds for the computational model and improves upon previous efficient algorithms that required

samples. The approach is simple, practical, and provides insight into information-computation tradeoffs in Massart-noise settings, with potential extensions to general halfspaces and dimension-efficient implementations.

Abstract

We study the problem of PAC learning

-margin halfspaces in the presence of Massart noise. Without computational considerations, the sample complexity of this learning problem is known to be

. Prior computationally efficient algorithms for the problem incur sample complexity

and achieve 0-1 error of

, where

is the upper bound on the noise rate. Recent work gave evidence of an information-computation tradeoff, suggesting that a quadratic dependence on

is required for computationally efficient algorithms. Our main result is a computationally efficient learner with sample complexity

, nearly matching this lower bound. In addition, our algorithm is simple and practical, relying on online SGD on a carefully selected sequence of convex losses.

Paper Structure (16 sections, 7 theorems, 35 equations, 1 algorithm)

This paper contains 16 sections, 7 theorems, 35 equations, 1 algorithm.

Introduction
Our Result and Techniques
Independent Work
Brief Overview of Techniques
Notation
Our Algorithm and its Analysis: Proof of \ref{['thm:main-inf']}
Conclusions and Open Problems
Organization
Related and Prior Work
Additional Related Work
Comparison with DKTZonline
Learning Margin Massart Halfspaces via Cutting Planes
Omitted Proofs from \ref{['sec:main']}
Proof of \ref{['clm:dgt-claim']}
Proof of \ref{['clm:first-clm']}
...and 1 more sections

Key Result

Theorem 1.3

Let $D$ be a distribution on $\mathbb{S}^{d-1} \times \{\pm 1\}$ that satisfies the $\eta$-Massart noise condition with respect to an unknown $\gamma$-margin halfspace $f(\mathbf{x}) = \mathrm{sign}({\mathbf{w}}^{\ast}\cdot\mathbf{x})$. There is algorithm that draws $n = \tilde{O} (1/(\epsilon^2 \ga

Theorems & Definitions (26)

Definition 1.1: PAC Learning with Massart Noise
Definition 1.2: $\gamma$-Margin Halfspaces
Theorem 1.3: Main Result, Informal
Theorem 2.1: Main Result
Lemma 2.2: Structural Lemma
proof
Claim 2.3
Claim 2.4
proof : Proof of \ref{['thm:main-detailed']}
Claim 2.5
...and 16 more

A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise

TL;DR

Abstract

A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (26)