On the Hardness of Training Deep Neural Networks Discretely
Ilan Doron-Arad
TL;DR
This work proves that training with discrete neural-network parameters becomes dramatically harder as network depth increases, establishing that deep D-NNT instances are unlikely to lie in NP under standard complexity assumptions, and that hardness persists when translating to continuous variants via polynomial reductions. It also provides strong NP-hardness and ETH-based lower bounds for two-layer networks, even in highly restricted settings, and offers a pseudo-polynomial DP algorithm in a fixed-data regime to illustrate the boundary of tractability. The results highlight a depth-driven separation between discrete and continuous training, with implications for quantization and deployment of neural nets under worst-case conditions. Overall, the paper connects NNT hardness with classic computational problems through explicit reductions, clarifying how depth and discretization jointly shape the algorithmic landscape of neural-network training.
Abstract
We study neural network training (NNT): optimizing a neural network's parameters to minimize the training loss over a given dataset. NNT has been studied extensively under theoretic lenses, mainly on two-layer networks with linear or ReLU activation functions where the parameters can take any real value (here referred to as continuous NNT (C-NNT)). However, less is known about deeper neural networks, which exhibit substantially stronger capabilities in practice. In addition, the complexity of the discrete variant of the problem (D-NNT in short), in which the parameters are taken from a given finite set of options, has remained less explored despite its theoretical and practical significance. In this work, we show that the hardness of NNT is dramatically affected by the network depth. Specifically, we show that, under standard complexity assumptions, D-NNT is not in the complexity class NP even for instances with fixed dimensions and dataset size, having a deep architecture. This separates D-NNT from any NP-complete problem. Furthermore, using a polynomial reduction we show that the above result also holds for C-NNT, albeit with more structured instances. We complement these results with a comprehensive list of NP-hardness lower bounds for D-NNT on two-layer networks, showing that fixing the number of dimensions, the dataset size, or the number of neurons in the hidden layer leaves the problem challenging. Finally, we obtain a pseudo-polynomial algorithm for D-NNT on a two-layer network with a fixed dataset size.
