The Computational Complexity of Training ReLU(s)
Pasin Manurangsi, Daniel Reichman
TL;DR
This paper proves NP-hardness for training ReLU networks, including a single ReLU and two-ReLU depth-2 networks, highlighting fundamental optimization barriers. It also provides positive results: under unit-ball norms, depth-2 ReLUs with k units can be properly learned in agnostic and reliable settings, with running times exponential in k/ε but polynomial in n and 1/δ. The authors connect hardness results with generalization theory, leveraging Arora et al.'s exponential-time training algorithm to obtain proper, reliably proper learning guarantees and to bound generalization error via Rademacher complexity. Together, the results delineate the boundary between computational hardness and learnability for shallow ReLU architectures and offer concrete learning algorithms under norm-bounded regimes.
Abstract
We consider the computational complexity of training depth-2 neural networks composed of rectified linear units (ReLUs). We show that, even for the case of a single ReLU, finding a set of weights that minimizes the squared error (even approximately) for a given training set is NP-hard. We also show that for a simple network consisting of two ReLUs, the error minimization problem is NP-hard, even in the realizable case. We complement these hardness results by showing that, when the weights and samples belong to the unit ball, one can (agnostically) properly and reliably learn depth-2 ReLUs with $k$ units and error at most $ε$ in time $2^{(k/ε)^{O(1)}}n^{O(1)}$; this extends upon a previous work of Goel, Kanade, Klivans and Thaler (2017) which provided efficient improper learning algorithms for ReLUs.
