On the Hardness of Learning One Hidden Layer Neural Networks

Shuchen Li; Ilias Zadik; Manolis Zampetakis

On the Hardness of Learning One Hidden Layer Neural Networks

Shuchen Li, Ilias Zadik, Manolis Zampetakis

TL;DR

This work proves that learning the class of polynomial-width one-hidden-layer ReLU networks under Gaussian input and polynomially small Gaussian noise is computationally hard under standard cryptographic assumptions. The authors construct a chain of reductions: CLWE hardness implies hardness for learning Lipschitz periodic neurons under Gaussian noise, which is then shown to be equivalent to learning certain one-hidden-layer networks on a capped interval; combining with reductions from GapSVP to CLWE yields a poly-time quantum hardness result for GapSVP, and thus for the learning problem. The main result shows that any polynomial-time $\varepsilon$-weak learner with $\varepsilon=1/\poly(d)$ for width $k=\omega(\sqrt{d\log d})$ networks would imply a polynomial-time quantum algorithm for GapSVP within polynomial factors, effectively ruling out efficient learning in this regime unless lattice problems become tractable. The paper also extends the hardness to super-polynomially small noise levels, connecting robustness to cryptographic reductions via LWE/CLWE frameworks and highlighting significant implications for the computational landscape of neural-network learning with Gaussian inputs.

Abstract

In this work, we consider the problem of learning one hidden layer ReLU neural networks with inputs from $\mathbb{R}^d$. We show that this learning problem is hard under standard cryptographic assumptions even when: (1) the size of the neural network is polynomial in $d$, (2) its input distribution is a standard Gaussian, and (3) the noise is Gaussian and polynomially small in $d$. Our hardness result is based on the hardness of the Continuous Learning with Errors (CLWE) problem, and in particular, is based on the largely believed worst-case hardness of approximately solving the shortest vector problem up to a multiplicative polynomial factor.

On the Hardness of Learning One Hidden Layer Neural Networks

TL;DR

-weak learner with

for width

networks would imply a polynomial-time quantum algorithm for GapSVP within polynomial factors, effectively ruling out efficient learning in this regime unless lattice problems become tractable. The paper also extends the hardness to super-polynomially small noise levels, connecting robustness to cryptographic reductions via LWE/CLWE frameworks and highlighting significant implications for the computational landscape of neural-network learning with Gaussian inputs.

Abstract

In this work, we consider the problem of learning one hidden layer ReLU neural networks with inputs from

. We show that this learning problem is hard under standard cryptographic assumptions even when: (1) the size of the neural network is polynomial in

, (2) its input distribution is a standard Gaussian, and (3) the noise is Gaussian and polynomially small in

. Our hardness result is based on the hardness of the Continuous Learning with Errors (CLWE) problem, and in particular, is based on the largely believed worst-case hardness of approximately solving the shortest vector problem up to a multiplicative polynomial factor.

Paper Structure (16 sections, 16 theorems, 28 equations, 1 figure)

This paper contains 16 sections, 16 theorems, 28 equations, 1 figure.

Introduction
Prior work
Contribution
Organization
Preliminaries
Notations
PAC-learning with Gaussian input distribution.
Worst-Case Lattice Problems
Continuous Learning with Errors (CLWE) bruna2020continuous.
Main Result
Proof Sketch and Comparison with SZB21-cosine-learning
CLWE reduction to Lipschitz Periodic Neurons under Gaussian noise
Proof of Theorem \ref{['thm:CLWE-to-phi']}
The Cryptographic Hardness of Learning One Hidden Layer Neural Networks
Super-Polynomially Small Noise
...and 1 more sections

Key Result

theorem 1

Let $\mathcal{F}_k$ the class of width $k$ one hidden layer neural networks and arbitrary noise variance $\sigma=1/\mathrm{poly}(d).$ For any $k=\omega(\sqrt{d \log d}),$ if there exists a polynomial-time algorithm that can weakly learn $\mathcal{F}_k$ under Gaussian noise of variance $\sigma$ then

Figures (1)

Figure 1: $\phi(x)$ and $\nn(x)$ for $R=3$

Theorems & Definitions (29)

theorem 1: Informal; see Theorem \ref{['thm:mainTheorem']}
definition 1: Weak learning
definition 2
theorem 2: bruna2020continuous
theorem 3
lemma 1
proof
theorem 4
corollary 1
proof
...and 19 more

On the Hardness of Learning One Hidden Layer Neural Networks

TL;DR

Abstract

On the Hardness of Learning One Hidden Layer Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (29)