Table of Contents
Fetching ...

On the Hardness of Learning One Hidden Layer Neural Networks

Shuchen Li, Ilias Zadik, Manolis Zampetakis

TL;DR

This work proves that learning the class of polynomial-width one-hidden-layer ReLU networks under Gaussian input and polynomially small Gaussian noise is computationally hard under standard cryptographic assumptions. The authors construct a chain of reductions: CLWE hardness implies hardness for learning Lipschitz periodic neurons under Gaussian noise, which is then shown to be equivalent to learning certain one-hidden-layer networks on a capped interval; combining with reductions from GapSVP to CLWE yields a poly-time quantum hardness result for GapSVP, and thus for the learning problem. The main result shows that any polynomial-time $\varepsilon$-weak learner with $\varepsilon=1/\poly(d)$ for width $k=\omega(\sqrt{d\log d})$ networks would imply a polynomial-time quantum algorithm for GapSVP within polynomial factors, effectively ruling out efficient learning in this regime unless lattice problems become tractable. The paper also extends the hardness to super-polynomially small noise levels, connecting robustness to cryptographic reductions via LWE/CLWE frameworks and highlighting significant implications for the computational landscape of neural-network learning with Gaussian inputs.

Abstract

In this work, we consider the problem of learning one hidden layer ReLU neural networks with inputs from $\mathbb{R}^d$. We show that this learning problem is hard under standard cryptographic assumptions even when: (1) the size of the neural network is polynomial in $d$, (2) its input distribution is a standard Gaussian, and (3) the noise is Gaussian and polynomially small in $d$. Our hardness result is based on the hardness of the Continuous Learning with Errors (CLWE) problem, and in particular, is based on the largely believed worst-case hardness of approximately solving the shortest vector problem up to a multiplicative polynomial factor.

On the Hardness of Learning One Hidden Layer Neural Networks

TL;DR

This work proves that learning the class of polynomial-width one-hidden-layer ReLU networks under Gaussian input and polynomially small Gaussian noise is computationally hard under standard cryptographic assumptions. The authors construct a chain of reductions: CLWE hardness implies hardness for learning Lipschitz periodic neurons under Gaussian noise, which is then shown to be equivalent to learning certain one-hidden-layer networks on a capped interval; combining with reductions from GapSVP to CLWE yields a poly-time quantum hardness result for GapSVP, and thus for the learning problem. The main result shows that any polynomial-time -weak learner with for width networks would imply a polynomial-time quantum algorithm for GapSVP within polynomial factors, effectively ruling out efficient learning in this regime unless lattice problems become tractable. The paper also extends the hardness to super-polynomially small noise levels, connecting robustness to cryptographic reductions via LWE/CLWE frameworks and highlighting significant implications for the computational landscape of neural-network learning with Gaussian inputs.

Abstract

In this work, we consider the problem of learning one hidden layer ReLU neural networks with inputs from . We show that this learning problem is hard under standard cryptographic assumptions even when: (1) the size of the neural network is polynomial in , (2) its input distribution is a standard Gaussian, and (3) the noise is Gaussian and polynomially small in . Our hardness result is based on the hardness of the Continuous Learning with Errors (CLWE) problem, and in particular, is based on the largely believed worst-case hardness of approximately solving the shortest vector problem up to a multiplicative polynomial factor.
Paper Structure (16 sections, 16 theorems, 28 equations, 1 figure)

This paper contains 16 sections, 16 theorems, 28 equations, 1 figure.

Key Result

theorem 1

Let $\mathcal{F}_k$ the class of width $k$ one hidden layer neural networks and arbitrary noise variance $\sigma=1/\mathrm{poly}(d).$ For any $k=\omega(\sqrt{d \log d}),$ if there exists a polynomial-time algorithm that can weakly learn $\mathcal{F}_k$ under Gaussian noise of variance $\sigma$ then

Figures (1)

  • Figure 1: $\phi(x)$ and $\nn(x)$ for $R=3$

Theorems & Definitions (29)

  • theorem 1: Informal; see Theorem \ref{['thm:mainTheorem']}
  • definition 1: Weak learning
  • definition 2
  • theorem 2: bruna2020continuous
  • theorem 3
  • lemma 1
  • proof
  • theorem 4
  • corollary 1
  • proof
  • ...and 19 more