Robustly Learning Single-Index Models via Alignment Sharpness

Nikos Zarifis; Puqian Wang; Ilias Diakonikolas; Jelena Diakonikolas

Robustly Learning Single-Index Models via Alignment Sharpness

Nikos Zarifis, Puqian Wang, Ilias Diakonikolas, Jelena Diakonikolas

TL;DR

The paper tackles the problem of learning single-index models under the squared loss in the agnostic setting with unknown link functions. It introduces alignment sharpness, a local-error-bound notion for a convex surrogate loss, and develops a computationally efficient algorithm that achieves a universal constant-factor approximation to the best possible $L_2^2$ loss. The key ideas are to select best-fit activations along a projected direction and to leverage a gradient-alignment guarantee that contracts misalignment between the estimated and true directions, enabling a linear-rate convergence. The results hold under mild distributional assumptions (the well-behaved class) and for broad activation families $igl( ext{a}, ext{b}igr)$-unbounded, including ReLU-like functions, providing the first polynomial-time constant-factor agnostic learner for Gaussian marginals and unknown link functions. The work thus advances practical agnostic learning for SIMs and suggests broader applicability of alignment-based analysis in optimization.

Abstract

We study the problem of learning Single-Index Models under the $L_2^2$ loss in the agnostic model. We give an efficient learning algorithm, achieving a constant factor approximation to the optimal loss, that succeeds under a range of distributions (including log-concave distributions) and a broad class of monotone and Lipschitz link functions. This is the first efficient constant factor approximate agnostic learner, even for Gaussian data and for any nontrivial class of link functions. Prior work for the case of unknown link function either works in the realizable setting or does not attain constant factor approximation. The main technical ingredient enabling our algorithm and analysis is a novel notion of a local error bound in optimization that we term alignment sharpness and that may be of broader interest.

Robustly Learning Single-Index Models via Alignment Sharpness

TL;DR

loss. The key ideas are to select best-fit activations along a projected direction and to leverage a gradient-alignment guarantee that contracts misalignment between the estimated and true directions, enabling a linear-rate convergence. The results hold under mild distributional assumptions (the well-behaved class) and for broad activation families

-unbounded, including ReLU-like functions, providing the first polynomial-time constant-factor agnostic learner for Gaussian marginals and unknown link functions. The work thus advances practical agnostic learning for SIMs and suggests broader applicability of alignment-based analysis in optimization.

Abstract

We study the problem of learning Single-Index Models under the

loss in the agnostic model. We give an efficient learning algorithm, achieving a constant factor approximation to the optimal loss, that succeeds under a range of distributions (including log-concave distributions) and a broad class of monotone and Lipschitz link functions. This is the first efficient constant factor approximate agnostic learner, even for Gaussian data and for any nontrivial class of link functions. Prior work for the case of unknown link function either works in the realizable setting or does not attain constant factor approximation. The main technical ingredient enabling our algorithm and analysis is a novel notion of a local error bound in optimization that we term alignment sharpness and that may be of broader interest.

Paper Structure (38 sections, 18 theorems, 251 equations, 3 figures, 4 algorithms)

This paper contains 38 sections, 18 theorems, 251 equations, 3 figures, 4 algorithms.

Introduction
Overview of Results
Distributional Assumptions
Unbounded Activations
Technical Overview
Technical Comparison to GGKS23
Preliminaries
Basic Notation
Asymptotic Notation
Probability Notation
Organization
Main Structural Result: Alignment Sharpness of Surrogate Loss
$L_2^2$ Error and Misalignment
Closeness of Idealized and Attainable Activations
Proof of \ref{['main:thm:sharpness']}
...and 23 more sections

Key Result

Theorem 1.4

Given def:agnostic-learning, where $\mathcal{G}$ is the class of $(L, R)$-well behaved distributions with $L, R = O(1)$ and $\mathcal{F} = \mathcal{U}_{(a,b)}$ such that $(1/a), b = O(1)$, there is an algorithm that draws $N = \mathrm{poly}(W) \tilde{O}(d/\epsilon^{2})$ samples from $\mathcal{D}$, r

Figures (3)

Figure 1: Under the assumption that $\tilde{\mathbf{v}}\cdot\mathbf{x}\in(R/16,R/8)$, and $I_1(\mathbf{x})\geq 0, I_2(\mathbf{x})\geq 0$, the distance between $f(\mathbf{w}\cdot\mathbf{x})$ and $u^*(\mathbf{w}^*\cdot\mathbf{x})$ is at least $|u^*(\alpha \mathbf{w}\cdot \mathbf{x}+\|\mathbf{v}\|_2R/4)-u^*(\mathbf{w}^{\ast}\cdot\mathbf{x})|\geq a\|\mathbf{v}\|_2R/8$.
Figure 2: On the 2-dimensional space $V$ spanned by $(\mathbf{x}_{\mathbf{v}},\mathbf{x}_{\mathbf{w}})$, at each point $\mathbf{x}\in B\cup B'$, it must be that $I_1(\mathbf{x})I_2(\mathbf{x})\geq 0$ or $I_2(\mathbf{x})I_3(\mathbf{x})\geq 0$. $\Gamma_1$ denotes the interval of $\mathbf{x}_\mathbf{w} = \mathbf{w}\cdot\mathbf{x}$ such that $f(\mathbf{w}\cdot\mathbf{x})\geq u^*(\alpha\mathbf{w}\cdot\mathbf{x} + \|\mathbf{v}\|_2R)$, hence both $I_1(\mathbf{x})I_2(\mathbf{x})\geq 0,\, I_2(\mathbf{x})I_3(\mathbf{x})\geq 0$; $\Gamma_2$ denotes the interval of $\mathbf{x}_\mathbf{w}$ such that $f(\mathbf{w}\cdot\mathbf{x})\in (u^*(\alpha\mathbf{w}\cdot\mathbf{x} + \|\mathbf{v}\|_2R/32), u^*(\alpha\mathbf{w}\cdot\mathbf{x} + \|\mathbf{v}\|_2R/4))$, hence $I_2(\mathbf{x})I_3(\mathbf{x})\geq 0$; finally, $\Gamma_3$ denotes the interval of $\mathbf{x}_\mathbf{w}$ such that $f(\mathbf{w}\cdot\mathbf{x})\in (u^*(\alpha\mathbf{w}\cdot\mathbf{x} + \|\mathbf{v}\|_2R/4), u^*(\alpha\mathbf{w}\cdot\mathbf{x} + \|\mathbf{v}\|_2R/))$, hence $I_1(\mathbf{x})I_2(\mathbf{x})\geq 0$. The area of the union of the red and blue regions is the lower bound on the probability in \ref{['ineq:lower-bound-prob-mass']}. As displayed in the figure, the sum of the blue and red region is lower bounded by $\mathds{1}\{\mathbf{x}\in B\} + (\mathds{1}\{\mathbf{x}\in B'\} - \mathds{1}\{\mathbf{x}\in B\}) \mathds{1}\{I_2(\mathbf{x})I_3(\mathbf{x})\geq 0\}$.
Figure 3: An illustration of $\hat{f}$ for $u^*(z) = \max\{0,z\}$ and a dataset $S^* =\{(\mathbf{x}^{(1)},u^*(\mathbf{w}^*\cdot\mathbf{x}^{(1)})),\dots,(\mathbf{x}^{(6)},u^*(\mathbf{w}^*\cdot\mathbf{x}^{(6)}))\}$ where $\mathbf{w}^*\cdot\mathbf{x}^{(1)}<\mathbf{w}^*\cdot\mathbf{x}^{(2)}<\mathbf{w}^*\cdot\mathbf{x}^{(3)}<0$.

Theorems & Definitions (56)

Definition 1.2: Well-Behaved Distributions
Definition 1.3: Unbounded Activations
Theorem 1.4: Main Algorithmic Result, Informal
Example 1.5
Proposition 3.0: Alignment Sharpness of the Convex Surrogate
Lemma 3.0: Lower Bound on $L_2^2$ Error by Misalignment
proof
Lemma 3.0: Closeness of Population-Optimal Activations
Corollary 3.0: Closeness of Idealized and Attainable Activations
proof : Proof of \ref{['main:thm:sharpness']}
...and 46 more

Robustly Learning Single-Index Models via Alignment Sharpness

TL;DR

Abstract

Robustly Learning Single-Index Models via Alignment Sharpness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (56)