Table of Contents
Fetching ...

Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise

Shuyao Li, Sushrut Karmalkar, Ilias Diakonikolas, Jelena Diakonikolas

TL;DR

The algorithm is designed by directly bounding the risk with respect to the original, nonconvex $L_2^2$ loss by follows a primal-dual framework and opens new avenues for the design of primal-dual algorithms under structured nonconvexity.

Abstract

We study the problem of learning a single neuron with respect to the $L_2^2$-loss in the presence of adversarial distribution shifts, where the labels can be arbitrary, and the goal is to find a ``best-fit'' function. More precisely, given training samples from a reference distribution $\mathcal{p}_0$, the goal is to approximate the vector $\mathbf{w}^*$ which minimizes the squared loss with respect to the worst-case distribution that is close in $χ^2$-divergence to $\mathcal{p}_{0}$. We design a computationally efficient algorithm that recovers a vector $ \hat{\mathbf{w}}$ satisfying $\mathbb{E}_{\mathcal{p}^*} (σ(\hat{\mathbf{w}} \cdot \mathbf{x}) - y)^2 \leq C \, \mathbb{E}_{\mathcal{p}^*} (σ(\mathbf{w}^* \cdot \mathbf{x}) - y)^2 + ε$, where $C>1$ is a dimension-independent constant and $(\mathbf{w}^*, \mathcal{p}^*)$ is the witness attaining the min-max risk $\min_{\mathbf{w}~:~\|\mathbf{w}\| \leq W} \max_{\mathcal{p}} \mathbb{E}_{(\mathbf{x}, y) \sim \mathcal{p}} (σ(\mathbf{w} \cdot \mathbf{x}) - y)^2 - νχ^2(\mathcal{p}, \mathcal{p}_0)$. Our algorithm follows a primal-dual framework and is designed by directly bounding the risk with respect to the original, nonconvex $L_2^2$ loss. From an optimization standpoint, our work opens new avenues for the design of primal-dual algorithms under structured nonconvexity.

Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise

TL;DR

The algorithm is designed by directly bounding the risk with respect to the original, nonconvex loss by follows a primal-dual framework and opens new avenues for the design of primal-dual algorithms under structured nonconvexity.

Abstract

We study the problem of learning a single neuron with respect to the -loss in the presence of adversarial distribution shifts, where the labels can be arbitrary, and the goal is to find a ``best-fit'' function. More precisely, given training samples from a reference distribution , the goal is to approximate the vector which minimizes the squared loss with respect to the worst-case distribution that is close in -divergence to . We design a computationally efficient algorithm that recovers a vector satisfying , where is a dimension-independent constant and is the witness attaining the min-max risk . Our algorithm follows a primal-dual framework and is designed by directly bounding the risk with respect to the original, nonconvex loss. From an optimization standpoint, our work opens new avenues for the design of primal-dual algorithms under structured nonconvexity.

Paper Structure

This paper contains 33 sections, 26 theorems, 125 equations, 1 algorithm.

Key Result

Theorem 1.4

Suppose that the learner has access to $N = \tilde{\Omega}(d /\epsilon^2)$ samples drawn from the reference distribution ${{\mathcal{p}}_0}$. If all samples are bounded and the distribution ${\mathcal{p}}^*$ satisfies the "margin-like" condition and concentration (assump:marginassump:concentration

Theorems & Definitions (62)

  • Definition 1.1: Unbounded DKTZ22 + Convex Activation
  • Definition 1.2: Loss, Risk, and $\mathop{\mathrm{OPT}}\nolimits$
  • Theorem 1.4: Main Theorem --- Informal
  • Lemma 2.5: Empirical Sharpness; Informal. See \ref{['lemma:sharpness-empirical-appendix']}
  • Theorem 3.1: Main Theorem
  • Lemma 3.2: Gap Lower Bound
  • proof
  • Lemma 3.2: Gap Upper Bound
  • Lemma 3.3
  • proof : Proof of \ref{['thm:main-formal']}
  • ...and 52 more