Table of Contents
Fetching ...

Provably Good Solutions to the Knapsack Problem via Neural Networks of Bounded Size

Christoph Hertrich, Martin Skutella

TL;DR

The paper investigates how large a ReLU neural network must be to compute provably good solutions for the NP-hard Knapsack Problem. It develops two DP-inspired networks: an exact DP-NN with depth $O(n)$ and width $O((p^*)^2)$ that exactly implements the dynamic programming formulation, and a fixed-width FPTAS-NN of depth $5$ that guarantees a worst-case approximation within $1 - O\left(\frac{n^2}{\sqrt{w}}\right)$ using width $w=O(n^4/\varepsilon^2)$. It also provides empirical evidence supporting the superlinear width growth and extends the methodology to other DP-friendly combinatorial optimization problems such as LCS, various shortest path variants, and TSP. The results establish a concrete trade-off between neural network size and worst-case solution quality under a DP-based paradigm, and they open up avenues for general DP-to-NN translations and further complexity-theoretic investigations.

Abstract

The development of a satisfying and rigorous mathematical understanding of the performance of neural networks is a major challenge in artificial intelligence. Against this background, we study the expressive power of neural networks through the example of the classical NP-hard Knapsack Problem. Our main contribution is a class of recurrent neural networks (RNNs) with rectified linear units that are iteratively applied to each item of a Knapsack instance and thereby compute optimal or provably good solution values. We show that an RNN of depth four and width depending quadratically on the profit of an optimum Knapsack solution is sufficient to find optimum Knapsack solutions. We also prove the following tradeoff between the size of an RNN and the quality of the computed Knapsack solution: for Knapsack instances consisting of $n$ items, an RNN of depth five and width $w$ computes a solution of value at least $1-\mathcal{O}(n^2/\sqrt{w})$ times the optimum solution value. Our results build upon a classical dynamic programming formulation of the Knapsack Problem as well as a careful rounding of profit values that are also at the core of the well-known fully polynomial-time approximation scheme for the Knapsack Problem. A carefully conducted computational study qualitatively supports our theoretical size bounds. Finally, we point out that our results can be generalized to many other combinatorial optimization problems that admit dynamic programming solution methods, such as various Shortest Path Problems, the Longest Common Subsequence Problem, and the Traveling Salesperson Problem.

Provably Good Solutions to the Knapsack Problem via Neural Networks of Bounded Size

TL;DR

The paper investigates how large a ReLU neural network must be to compute provably good solutions for the NP-hard Knapsack Problem. It develops two DP-inspired networks: an exact DP-NN with depth and width that exactly implements the dynamic programming formulation, and a fixed-width FPTAS-NN of depth that guarantees a worst-case approximation within using width . It also provides empirical evidence supporting the superlinear width growth and extends the methodology to other DP-friendly combinatorial optimization problems such as LCS, various shortest path variants, and TSP. The results establish a concrete trade-off between neural network size and worst-case solution quality under a DP-based paradigm, and they open up avenues for general DP-to-NN translations and further complexity-theoretic investigations.

Abstract

The development of a satisfying and rigorous mathematical understanding of the performance of neural networks is a major challenge in artificial intelligence. Against this background, we study the expressive power of neural networks through the example of the classical NP-hard Knapsack Problem. Our main contribution is a class of recurrent neural networks (RNNs) with rectified linear units that are iteratively applied to each item of a Knapsack instance and thereby compute optimal or provably good solution values. We show that an RNN of depth four and width depending quadratically on the profit of an optimum Knapsack solution is sufficient to find optimum Knapsack solutions. We also prove the following tradeoff between the size of an RNN and the quality of the computed Knapsack solution: for Knapsack instances consisting of items, an RNN of depth five and width computes a solution of value at least times the optimum solution value. Our results build upon a classical dynamic programming formulation of the Knapsack Problem as well as a careful rounding of profit values that are also at the core of the well-known fully polynomial-time approximation scheme for the Knapsack Problem. A carefully conducted computational study qualitatively supports our theoretical size bounds. Finally, we point out that our results can be generalized to many other combinatorial optimization problems that admit dynamic programming solution methods, such as various Shortest Path Problems, the Longest Common Subsequence Problem, and the Traveling Salesperson Problem.

Paper Structure

This paper contains 8 sections, 12 theorems, 20 equations, 10 figures.

Key Result

Theorem 1

For a Knapsack instance with capacity $S=1$, $s_i\in\left]0,1\right]$, and $p_i\in\mathbb{N}$, for $i\in[n]$, with an upper bound $p^*$ on the optimal solution value, the corresponding dynamic programming values $f(p,i)$, $i\in[n]$, $p\in p^*$, can be exactly computed by iteratively applying the DP-

Figures (10)

  • Figure 1: An NN with two input neurons, labeled $x_1$ and $x_2$, one hidden neuron, labeled with the shape of the rectifier function, and one output neuron, labeled $y$. The arcs are labeled with their weights and all biases are zero. The network has depth 2, width 1, and size 1. It computes the function $x \mapsto y= x_2-\max\{0,x_2-x_1\} =\min\{x_1,x_2\}.$
  • Figure 2: Basic structure of an (unfolded) RNN.
  • Figure 3: Recurrent structure of the DP-NN to solve the Knapsack Problem.
  • Figure 4: Desirable architecture for computing $\mathbf{f}_\mathrm{out}(p)$, $p\in[p^*]$, from the inputs. However, the existence of an edge (nonzero weight) depends critically on the input value $\mathbf{p}_\mathrm{in}$, which is not allowed.
  • Figure 5: High-level idea how the DP-NN computes $\mathbf{f}_\mathrm{out}(p)$ for $p\in[p^*]$ from the inputs.
  • ...and 5 more figures

Theorems & Definitions (23)

  • Theorem 1
  • Lemma 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • proof
  • Theorem 2
  • Theorem 3
  • Lemma 4
  • ...and 13 more