Testable Learning of General Halfspaces under Massart Noise

Ilias Diakonikolas; Giannis Iakovidis; Daniel M. Kane; Sihan Liu

Testable Learning of General Halfspaces under Massart Noise

Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Sihan Liu

TL;DR

The main result is the first testable learning algorithm for general halfspaces with Massart noise and Gaussian marginals, which qualitatively matches the known quasi-polynomial Statistical Query lower bound for the non-testable setting.

Abstract

We study the algorithmic task of testably learning general Massart halfspaces under the Gaussian distribution. In the testable learning setting, the aim is the design of a tester-learner pair satisfying the following properties: (1) if the tester accepts, the learner outputs a hypothesis and a certificate that it achieves near-optimal error, and (2) it is highly unlikely that the tester rejects if the data satisfies the underlying assumptions. Our main result is the first testable learning algorithm for general halfspaces with Massart noise and Gaussian marginals. The complexity of our algorithm is $d^{\mathrm{polylog}(\min\{1/γ, 1/ε\})}$, where $ε$ is the excess error and $γ$ is the bias of the target halfspace, which qualitatively matches the known quasi-polynomial Statistical Query lower bound for the non-testable setting. The analysis of our algorithm hinges on a novel sandwiching polynomial approximation to the sign function with multiplicative error that may be of broader interest.

Testable Learning of General Halfspaces under Massart Noise

TL;DR

Abstract

, where

is the excess error and

is the bias of the target halfspace, which qualitatively matches the known quasi-polynomial Statistical Query lower bound for the non-testable setting. The analysis of our algorithm hinges on a novel sandwiching polynomial approximation to the sign function with multiplicative error that may be of broader interest.

Paper Structure (23 sections, 18 theorems, 150 equations, 2 figures, 1 algorithm)

This paper contains 23 sections, 18 theorems, 150 equations, 2 figures, 1 algorithm.

Introduction
Halfspaces and Their Efficient Learnability
Testable Learning
Our Results
Implication for Non-testable Learning of Massart Halfspaces
Technical Overview
Comparison to Prior Techniques
Preliminaries
Algorithm Description and Analysis
Parameter Description
Proof of Correctness
Proof Sketch of \ref{['prop:soundness-gamma']}
Conclusions
Omitted Facts and Preliminaries
Hermite Polynomials
...and 8 more sections

Key Result

Theorem 1.4

Fix parameters $\eta\in[0,1/2), \beta \coloneqq 1-2\eta$ and $\gamma\in(0,1/2]$. Let $\mathcal{D}_\gamma$ be the class of distributions over $\mathbb{R}^d\times\{\pm1\}$ whose $\mathbf{x}$-marginal is the standard Gaussian $\mathcal{N}^d$ and satisfy the $\eta$-Massart noise condition with respect t samples, runs in $\mathrm{poly}(N,d)$ time, and testably learns the class $\mathcal{H}_{d,\gamma}$

Figures (2)

Figure 1: Illustration of a learned halfspace $h$, a competing halfspace $f$, the slices on which the tests are performed, and their disagreement region (shaded).
Figure 2: This figure shows our approximation procedure. Top left:$\dfrac{T_m(x)}{mx}$; we see that it is a polynomial with a bump-function shape. Top right:$f(x)=\left(\dfrac{T_m(x)}{mx}\right)^k$ (\ref{['eq:f-definition']}); as the power $k$ increases, $f$ becomes increasingly concentrated and approximates $\delta$. Bottom: the integral $p$ of $f$ over a sliding window (\ref{['eq:p-definition']}), which approximates a step function. The sliding-window length is chosen according to the desired accuracy; consequently, our approximation is forced to drop to $0$ outside the window. In all examples, the plotted curves grow polynomially to infinity beyond the figure, but this effect is controlled because the Gaussian tails dominate.

Theorems & Definitions (59)

Definition 1.1: Testable Learning, see RV23GKSV25
Definition 1.2: Massart Noise
Definition 1.3: $\gamma$-Biased halfspaces
Theorem 1.4: Testably Learning $\gamma$-Biased Massart Halfspaces
Theorem 1.5: Multiplicative Sandwiching Polynomial Approximation to the Sign Function
Proposition 3.1: Completeness
Proposition 3.2: Soundness against $\gamma$-Biased halfspaces
Lemma 3.3: $\gamma$-biased slices lead to $\Omega(\gamma)$ advantage
proof
Definition A.1: Normalized Hermite Polynomial
...and 49 more

Testable Learning of General Halfspaces under Massart Noise

TL;DR

Abstract

Testable Learning of General Halfspaces under Massart Noise

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (59)