Table of Contents
Fetching ...

Training Neural Networks is NP-Hard in Fixed Dimension

Vincent Froese, Christoph Hertrich

TL;DR

This work analyzes the parameterized complexity of training two-layer neural networks with ReLU and linear-threshold activations, focusing on input dimension $d$, hidden width $k$, and target error $\gamma$. It proves NP-hardness at fixed dimension ($d=2$) and W[1]-hardness for four ReLUs with zero training error, extending to linear-threshold activations, while also presenting an FPT algorithm for the convex-ReLU case under $\gamma=0$ with running time $2^{O(k^2 d)}\mathrm{poly}(k,L)$. The results delineate clear boundaries between intractable and tractable regimes across $(d,k)$ and activation type, employing geometric constructs like levees and selection gadgets to encode combinatorial constraints. Collectively, they settle much of the complexity landscape for exact training in these two-layer networks and motivate future work on approximate training and broader architectures.

Abstract

We study the parameterized complexity of training two-layer neural networks with respect to the dimension of the input data and the number of hidden neurons, considering ReLU and linear threshold activation functions. Albeit the computational complexity of these problems has been studied numerous times in recent years, several questions are still open. We answer questions by Arora et al. [ICLR '18] and Khalife and Basu [IPCO '22] showing that both problems are NP-hard for two dimensions, which excludes any polynomial-time algorithm for constant dimension. We also answer a question by Froese et al. [JAIR '22] proving W[1]-hardness for four ReLUs (or two linear threshold neurons) with zero training error. Finally, in the ReLU case, we show fixed-parameter tractability for the combined parameter number of dimensions and number of ReLUs if the network is assumed to compute a convex map. Our results settle the complexity status regarding these parameters almost completely.

Training Neural Networks is NP-Hard in Fixed Dimension

TL;DR

This work analyzes the parameterized complexity of training two-layer neural networks with ReLU and linear-threshold activations, focusing on input dimension , hidden width , and target error . It proves NP-hardness at fixed dimension () and W[1]-hardness for four ReLUs with zero training error, extending to linear-threshold activations, while also presenting an FPT algorithm for the convex-ReLU case under with running time . The results delineate clear boundaries between intractable and tractable regimes across and activation type, employing geometric constructs like levees and selection gadgets to encode combinatorial constraints. Collectively, they settle much of the complexity landscape for exact training in these two-layer networks and motivate future work on approximate training and broader architectures.

Abstract

We study the parameterized complexity of training two-layer neural networks with respect to the dimension of the input data and the number of hidden neurons, considering ReLU and linear threshold activation functions. Albeit the computational complexity of these problems has been studied numerous times in recent years, several questions are still open. We answer questions by Arora et al. [ICLR '18] and Khalife and Basu [IPCO '22] showing that both problems are NP-hard for two dimensions, which excludes any polynomial-time algorithm for constant dimension. We also answer a question by Froese et al. [JAIR '22] proving W[1]-hardness for four ReLUs (or two linear threshold neurons) with zero training error. Finally, in the ReLU case, we show fixed-parameter tractability for the combined parameter number of dimensions and number of ReLUs if the network is assumed to compute a convex map. Our results settle the complexity status regarding these parameters almost completely.
Paper Structure (13 sections, 9 theorems, 25 equations, 6 figures)

This paper contains 13 sections, 9 theorems, 25 equations, 6 figures.

Key Result

Theorem 1

2L-ReLU-NN-Train($\mathcal{L}$) is NP-hard even for $d=2$ and $\gamma=0$.

Figures (6)

  • Figure 1: Neural network architecture we study in this paper: After the input layer (left) with $d$ input neurons, we have one hidden layer with $k$ ReLU neurons and a single output neuron without additional activation function.
  • Figure 2: Illustration of the selection gadget with $\ell=3$ and $s_1=-1$, $s_2=0$, $s_3=1$. Both figures show the $x_1$-$x_2$-plane while the $y$-coordinate is indicated via the darkness of the gray color. The left picture shows all data points belonging to the gadget as well as the breaklines of the three possible levees fitting the data points. In addition to these features, the right picture shows a levee with slope $s_2=0$ as one of three possibilities to fit the data points of the gadget.
  • Figure 3: Cross section of the selection gadget through one of the three lines $h_{-\epsilon}$, $h_0$, or $h_\epsilon$. The nine data points (labeled $\mathbf p_1$ to $\mathbf p_9$) on each of these lines force the function $f$ to attain a "levee-shape" with the exact position and slope of the ascending and descending sections as the only degrees of freedom (left). The four additional data points on $h_0$ even fix these properties and thus exactly determine $f$ on that line (right).
  • Figure 4: Illustration of the segments $I_1$ to $I_4$ used in the proof of \ref{['lem:selectiongadget']}. The figure also highlights (in black) the data points at $(0,-2)$ and $(0,2)$, each of which lies on a convex breakline, as well as the data points at $(0,-1)$ and $(0,1)$, each of which lies on a concave breakline.
  • Figure 5: Global construction layout for the reduction from POITS to 2L-ReLU-NN-Train($\mathcal{L}$). The figure shows the construction for the instance $(v_5\vee v_4 \vee v_3)\wedge (v_4\vee v_3 \vee v_2)\wedge (v_5\vee v_2 \vee v_1)$. The vertical dotted line is the $x_2$-axis along which we place all the selection gadgets. Each gadget is depicted with a black square. Each solid gray line depicts one possible levee. Each gray circle depicts a data point $\mathbf p_{j,r}$ with label one. The picture on the right additionally shows one possible solution to the given instance. Indeed, choosing levees corresponding to the solid black lines selects exactly one levee per selection gadget and exactly one levee passing through each of the nine additional data points. This corresponds to the truth assignment $v_1=v_3=\text{true}$ and $v_2=v_4=v_5=\text{false}$.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Theorem 1
  • Definition 2
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • Lemma 6
  • proof
  • proof : Proof of \ref{['thm:d=2']}
  • Theorem 7
  • ...and 9 more