Table of Contents
Fetching ...

Testable Learning of General Halfspaces under Massart Noise

Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Sihan Liu

TL;DR

The main result is the first testable learning algorithm for general halfspaces with Massart noise and Gaussian marginals, which qualitatively matches the known quasi-polynomial Statistical Query lower bound for the non-testable setting.

Abstract

We study the algorithmic task of testably learning general Massart halfspaces under the Gaussian distribution. In the testable learning setting, the aim is the design of a tester-learner pair satisfying the following properties: (1) if the tester accepts, the learner outputs a hypothesis and a certificate that it achieves near-optimal error, and (2) it is highly unlikely that the tester rejects if the data satisfies the underlying assumptions. Our main result is the first testable learning algorithm for general halfspaces with Massart noise and Gaussian marginals. The complexity of our algorithm is $d^{\mathrm{polylog}(\min\{1/γ, 1/ε\})}$, where $ε$ is the excess error and $γ$ is the bias of the target halfspace, which qualitatively matches the known quasi-polynomial Statistical Query lower bound for the non-testable setting. The analysis of our algorithm hinges on a novel sandwiching polynomial approximation to the sign function with multiplicative error that may be of broader interest.

Testable Learning of General Halfspaces under Massart Noise

TL;DR

The main result is the first testable learning algorithm for general halfspaces with Massart noise and Gaussian marginals, which qualitatively matches the known quasi-polynomial Statistical Query lower bound for the non-testable setting.

Abstract

We study the algorithmic task of testably learning general Massart halfspaces under the Gaussian distribution. In the testable learning setting, the aim is the design of a tester-learner pair satisfying the following properties: (1) if the tester accepts, the learner outputs a hypothesis and a certificate that it achieves near-optimal error, and (2) it is highly unlikely that the tester rejects if the data satisfies the underlying assumptions. Our main result is the first testable learning algorithm for general halfspaces with Massart noise and Gaussian marginals. The complexity of our algorithm is , where is the excess error and is the bias of the target halfspace, which qualitatively matches the known quasi-polynomial Statistical Query lower bound for the non-testable setting. The analysis of our algorithm hinges on a novel sandwiching polynomial approximation to the sign function with multiplicative error that may be of broader interest.
Paper Structure (23 sections, 18 theorems, 150 equations, 2 figures, 1 algorithm)

This paper contains 23 sections, 18 theorems, 150 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1.4

Fix parameters $\eta\in[0,1/2), \beta \coloneqq 1-2\eta$ and $\gamma\in(0,1/2]$. Let $\mathcal{D}_\gamma$ be the class of distributions over $\mathbb{R}^d\times\{\pm1\}$ whose $\mathbf{x}$-marginal is the standard Gaussian $\mathcal{N}^d$ and satisfy the $\eta$-Massart noise condition with respect t samples, runs in $\mathrm{poly}(N,d)$ time, and testably learns the class $\mathcal{H}_{d,\gamma}$

Figures (2)

  • Figure 1: Illustration of a learned halfspace $h$, a competing halfspace $f$, the slices on which the tests are performed, and their disagreement region (shaded).
  • Figure 2: This figure shows our approximation procedure. Top left:$\dfrac{T_m(x)}{mx}$; we see that it is a polynomial with a bump-function shape. Top right:$f(x)=\left(\dfrac{T_m(x)}{mx}\right)^k$ (\ref{['eq:f-definition']}); as the power $k$ increases, $f$ becomes increasingly concentrated and approximates $\delta$. Bottom: the integral $p$ of $f$ over a sliding window (\ref{['eq:p-definition']}), which approximates a step function. The sliding-window length is chosen according to the desired accuracy; consequently, our approximation is forced to drop to $0$ outside the window. In all examples, the plotted curves grow polynomially to infinity beyond the figure, but this effect is controlled because the Gaussian tails dominate.

Theorems & Definitions (59)

  • Definition 1.1: Testable Learning, see RV23GKSV25
  • Definition 1.2: Massart Noise
  • Definition 1.3: $\gamma$-Biased halfspaces
  • Theorem 1.4: Testably Learning $\gamma$-Biased Massart Halfspaces
  • Theorem 1.5: Multiplicative Sandwiching Polynomial Approximation to the Sign Function
  • Proposition 3.1: Completeness
  • Proposition 3.2: Soundness against $\gamma$-Biased halfspaces
  • Lemma 3.3: $\gamma$-biased slices lead to $\Omega(\gamma)$ advantage
  • proof
  • Definition A.1: Normalized Hermite Polynomial
  • ...and 49 more