Table of Contents
Fetching ...

Optimization Over Trained Neural Networks: Taking a Relaxing Walk

Jiatai Tong, Junyang Cai, Thiago Serra

TL;DR

This paper tackles optimization over trained neural surrogates, notably ReLU networks, where exact MILP formulations struggle with scalability due to weak relaxations and dense constraint matrices. It introduces Relax-and-Walk (RW), an LP-based local search that explores linear regions by solving LP relaxations within a fixed activation pattern and then stepping into neighboring regions along improvement directions. A relaxation-based initializer LR seeds diverse regions by solving progressive relaxations and randomly fixing activations to reach new regions. Empirical evaluations on random networks and an MNIST-based adversarial task show that RW scales to larger architectures and yields competitive or superior solutions compared with Sample-and-MIP (SM) and in some cases outperforms Gurobi on deeper networks.

Abstract

Besides training, mathematical optimization is also used in deep learning to model and solve formulations over trained neural networks for purposes such as verification, compression, and optimization with learned constraints. However, solving these formulations soon becomes difficult as the network size grows due to the weak linear relaxation and dense constraint matrix. We have seen improvements in recent years with cutting plane algorithms, reformulations, and an heuristic based on Mixed-Integer Linear Programming (MILP). In this work, we propose a more scalable heuristic based on exploring global and local linear relaxations of the neural network model. Our heuristic is competitive with a state-of-the-art MILP solver and the prior heuristic while producing better solutions with increases in input, depth, and number of neurons.

Optimization Over Trained Neural Networks: Taking a Relaxing Walk

TL;DR

This paper tackles optimization over trained neural surrogates, notably ReLU networks, where exact MILP formulations struggle with scalability due to weak relaxations and dense constraint matrices. It introduces Relax-and-Walk (RW), an LP-based local search that explores linear regions by solving LP relaxations within a fixed activation pattern and then stepping into neighboring regions along improvement directions. A relaxation-based initializer LR seeds diverse regions by solving progressive relaxations and randomly fixing activations to reach new regions. Empirical evaluations on random networks and an MNIST-based adversarial task show that RW scales to larger architectures and yields competitive or superior solutions compared with Sample-and-MIP (SM) and in some cases outperforms Gurobi on deeper networks.

Abstract

Besides training, mathematical optimization is also used in deep learning to model and solve formulations over trained neural networks for purposes such as verification, compression, and optimization with learned constraints. However, solving these formulations soon becomes difficult as the network size grows due to the weak linear relaxation and dense constraint matrix. We have seen improvements in recent years with cutting plane algorithms, reformulations, and an heuristic based on Mixed-Integer Linear Programming (MILP). In this work, we propose a more scalable heuristic based on exploring global and local linear relaxations of the neural network model. Our heuristic is competitive with a state-of-the-art MILP solver and the prior heuristic while producing better solutions with increases in input, depth, and number of neurons.
Paper Structure (9 sections, 3 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 9 sections, 3 equations, 4 figures, 1 table, 2 algorithms.

Figures (4)

  • Figure 1: From a starting point, our local search algorithm moves in a certain direction indicated by the blue arrow, and then takes a small step into the next linear region before moving again. We stop when the next linear region has no better solution.
  • Figure 2: Comparison of best objective values obtained by RW and SM in random networks. The points are favorable to RW above the line $Y=X$; and to SW below it. RW is at least 1% better in 64.4% of the cases, while SM is in 12.2%.
  • Figure 3: Comparison of best objective values obtained by RW and Gurobi in random networks. RW is at least 1% better in 55.1% of the cases, while Gurobi is in 12.4%.
  • Figure 4: Comparison of best objective values obtained by RW and Gurobi in optimal adversary models. RW is at least 1% better in 68% of the cases, while Gurobi is in 30%.