Table of Contents
Fetching ...

The Computational Complexity of Training ReLU(s)

Pasin Manurangsi, Daniel Reichman

TL;DR

This paper proves NP-hardness for training ReLU networks, including a single ReLU and two-ReLU depth-2 networks, highlighting fundamental optimization barriers. It also provides positive results: under unit-ball norms, depth-2 ReLUs with k units can be properly learned in agnostic and reliable settings, with running times exponential in k/ε but polynomial in n and 1/δ. The authors connect hardness results with generalization theory, leveraging Arora et al.'s exponential-time training algorithm to obtain proper, reliably proper learning guarantees and to bound generalization error via Rademacher complexity. Together, the results delineate the boundary between computational hardness and learnability for shallow ReLU architectures and offer concrete learning algorithms under norm-bounded regimes.

Abstract

We consider the computational complexity of training depth-2 neural networks composed of rectified linear units (ReLUs). We show that, even for the case of a single ReLU, finding a set of weights that minimizes the squared error (even approximately) for a given training set is NP-hard. We also show that for a simple network consisting of two ReLUs, the error minimization problem is NP-hard, even in the realizable case. We complement these hardness results by showing that, when the weights and samples belong to the unit ball, one can (agnostically) properly and reliably learn depth-2 ReLUs with $k$ units and error at most $ε$ in time $2^{(k/ε)^{O(1)}}n^{O(1)}$; this extends upon a previous work of Goel, Kanade, Klivans and Thaler (2017) which provided efficient improper learning algorithms for ReLUs.

The Computational Complexity of Training ReLU(s)

TL;DR

This paper proves NP-hardness for training ReLU networks, including a single ReLU and two-ReLU depth-2 networks, highlighting fundamental optimization barriers. It also provides positive results: under unit-ball norms, depth-2 ReLUs with k units can be properly learned in agnostic and reliable settings, with running times exponential in k/ε but polynomial in n and 1/δ. The authors connect hardness results with generalization theory, leveraging Arora et al.'s exponential-time training algorithm to obtain proper, reliably proper learning guarantees and to bound generalization error via Rademacher complexity. Together, the results delineate the boundary between computational hardness and learnability for shallow ReLU architectures and offer concrete learning algorithms under norm-bounded regimes.

Abstract

We consider the computational complexity of training depth-2 neural networks composed of rectified linear units (ReLUs). We show that, even for the case of a single ReLU, finding a set of weights that minimizes the squared error (even approximately) for a given training set is NP-hard. We also show that for a simple network consisting of two ReLUs, the error minimization problem is NP-hard, even in the realizable case. We complement these hardness results by showing that, when the weights and samples belong to the unit ball, one can (agnostically) properly and reliably learn depth-2 ReLUs with units and error at most in time ; this extends upon a previous work of Goel, Kanade, Klivans and Thaler (2017) which provided efficient improper learning algorithms for ReLUs.

Paper Structure

This paper contains 19 sections, 14 theorems, 39 equations, 1 figure.

Key Result

Theorem 1

ReLU training problem for a neural network consisting of a single ReLU is NP-hard.

Figures (1)

  • Figure 1: Diagrams of networks considered in this work and previous works. (\ref{['subfig:single']}) and (\ref{['subfig:two']}) are the depth-2 networks we consider, for a single and two ReLUs respectively. For network (\ref{['subfig:single']}), we show that the training problem is NP-hard (Theorem \ref{['thm:single']}) and that even approximating the minimum squared error to within an almost polynomial factor is NP-hard (Theorem \ref{['thm:single-inapprox']}). For network (\ref{['subfig:two']}), we show that the training problem is hard, even in the realizable case (Theorem \ref{['thm:two_rel']}). Architectures in (\ref{['subfig:reluofrelu']}) and (\ref{['subfig:convnet']}) are considered in BDL18 and brutzkus2017globally respectively; the authors show that the training problem for their respective networks is NP-hard even in the realizable case.

Theorems & Definitions (26)

  • Theorem 1
  • proof
  • proof
  • Theorem 2
  • Definition 1
  • Definition 2
  • Theorem 3: DHK15
  • Theorem 4
  • proof
  • Proposition 5
  • ...and 16 more