Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise

Shuyao Li; Sushrut Karmalkar; Ilias Diakonikolas; Jelena Diakonikolas

Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise

Shuyao Li, Sushrut Karmalkar, Ilias Diakonikolas, Jelena Diakonikolas

TL;DR

The algorithm is designed by directly bounding the risk with respect to the original, nonconvex $L_2^2$ loss by follows a primal-dual framework and opens new avenues for the design of primal-dual algorithms under structured nonconvexity.

Abstract

We study the problem of learning a single neuron with respect to the $L_2^2$-loss in the presence of adversarial distribution shifts, where the labels can be arbitrary, and the goal is to find a ``best-fit'' function. More precisely, given training samples from a reference distribution $\mathcal{p}_0$, the goal is to approximate the vector $\mathbf{w}^*$ which minimizes the squared loss with respect to the worst-case distribution that is close in $χ^2$-divergence to $\mathcal{p}_{0}$. We design a computationally efficient algorithm that recovers a vector $ \hat{\mathbf{w}}$ satisfying $\mathbb{E}_{\mathcal{p}^*} (σ(\hat{\mathbf{w}} \cdot \mathbf{x}) - y)^2 \leq C \, \mathbb{E}_{\mathcal{p}^*} (σ(\mathbf{w}^* \cdot \mathbf{x}) - y)^2 + ε$, where $C>1$ is a dimension-independent constant and $(\mathbf{w}^*, \mathcal{p}^*)$ is the witness attaining the min-max risk $\min_{\mathbf{w}~:~\|\mathbf{w}\| \leq W} \max_{\mathcal{p}} \mathbb{E}_{(\mathbf{x}, y) \sim \mathcal{p}} (σ(\mathbf{w} \cdot \mathbf{x}) - y)^2 - νχ^2(\mathcal{p}, \mathcal{p}_0)$. Our algorithm follows a primal-dual framework and is designed by directly bounding the risk with respect to the original, nonconvex $L_2^2$ loss. From an optimization standpoint, our work opens new avenues for the design of primal-dual algorithms under structured nonconvexity.

Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise

TL;DR

The algorithm is designed by directly bounding the risk with respect to the original, nonconvex

loss by follows a primal-dual framework and opens new avenues for the design of primal-dual algorithms under structured nonconvexity.

Abstract

We study the problem of learning a single neuron with respect to the

-loss in the presence of adversarial distribution shifts, where the labels can be arbitrary, and the goal is to find a ``best-fit'' function. More precisely, given training samples from a reference distribution

, the goal is to approximate the vector

which minimizes the squared loss with respect to the worst-case distribution that is close in

-divergence to

. We design a computationally efficient algorithm that recovers a vector

satisfying

, where

is a dimension-independent constant and

is the witness attaining the min-max risk

. Our algorithm follows a primal-dual framework and is designed by directly bounding the risk with respect to the original, nonconvex

loss. From an optimization standpoint, our work opens new avenues for the design of primal-dual algorithms under structured nonconvexity.

Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise

TL;DR

Abstract

Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (62)