Table of Contents
Fetching ...

Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models

Puqian Wang, Nikos Zarifis, Ilias Diakonikolas, Jelena Diakonikolas

TL;DR

A sample and computationally efficient agnostic proper learner that attains the-error of $L^2_2$-error of $O(\mathrm{OPT})+\epsilon$, where $\mathrm{OPT}$ is the optimal loss.

Abstract

A single-index model (SIM) is a function of the form $σ(\mathbf{w}^{\ast} \cdot \mathbf{x})$, where $σ: \mathbb{R} \to \mathbb{R}$ is a known link function and $\mathbf{w}^{\ast}$ is a hidden unit vector. We study the task of learning SIMs in the agnostic (a.k.a. adversarial label noise) model with respect to the $L^2_2$-loss under the Gaussian distribution. Our main result is a sample and computationally efficient agnostic proper learner that attains $L^2_2$-error of $O(\mathrm{OPT})+ε$, where $\mathrm{OPT}$ is the optimal loss. The sample complexity of our algorithm is $\tilde{O}(d^{\lceil k^{\ast}/2\rceil}+d/ε)$, where $k^{\ast}$ is the information-exponent of $σ$ corresponding to the degree of its first non-zero Hermite coefficient. This sample bound nearly matches known CSQ lower bounds, even in the realizable setting. Prior algorithmic work in this setting had focused on learning in the realizable case or in the presence of semi-random noise. Prior computationally efficient robust learners required significantly stronger assumptions on the link function.

Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models

TL;DR

A sample and computationally efficient agnostic proper learner that attains the-error of -error of , where is the optimal loss.

Abstract

A single-index model (SIM) is a function of the form , where is a known link function and is a hidden unit vector. We study the task of learning SIMs in the agnostic (a.k.a. adversarial label noise) model with respect to the -loss under the Gaussian distribution. Our main result is a sample and computationally efficient agnostic proper learner that attains -error of , where is the optimal loss. The sample complexity of our algorithm is , where is the information-exponent of corresponding to the degree of its first non-zero Hermite coefficient. This sample bound nearly matches known CSQ lower bounds, even in the realizable setting. Prior algorithmic work in this setting had focused on learning in the realizable case or in the presence of semi-random noise. Prior computationally efficient robust learners required significantly stronger assumptions on the link function.

Paper Structure

This paper contains 32 sections, 23 theorems, 198 equations, 4 algorithms.

Key Result

Theorem 1.2

There exists an algorithm that draws $n = \tilde{\Theta}_{k^{\ast}}(d^{\lceil k^{\ast}/2 \rceil} + d/\epsilon)$ labeled samples, runs in $\mathrm{poly}(n, d)$ time, and outputs a weight vector $\widehat{\mathbf{w}} \in \mathbb{S}^{d-1}$ that with high probability satisfies ${\cal L}_2^{\sigma}(\wide

Theorems & Definitions (62)

  • Theorem 1.2: Main Result, Informal
  • Proposition 2.1: Initialization
  • Lemma 2.2
  • proof : Proof Sketch of \ref{['lem:range-singular-value-of-mtrizel(C_k)']}
  • Corollary 2.3
  • Lemma 2.4: Sample Complexity for Estimating the Unfolded Tensor Matrix
  • Lemma 2.5
  • Claim 3.1
  • Lemma 3.2
  • Lemma 3.3: Sharpness
  • ...and 52 more