Table of Contents
Fetching ...

An analysis of optimization problems involving ReLU neural networks

Christoph Plate, Mirko Hahn, Alexander Klimek, Caroline Ganzer, Kai Sundmacher, Sebastian Sager

TL;DR

Embedding ReLU networks into MINLP introduces large big-$M$ constants that hinder solver performance. The paper surveys and quantifies strategies including bound tightening (IA and LP-based), a posteriori ReLU scaling, training-time regularization, clipped ReLU, and dropout, demonstrating that LP-based tightening and scaling can substantially reduce big-$M$ and improve runtimes, with regularization during training providing the strongest overall gains by reducing linear regions and increasing stability. The results show a practical trade-off between neural redundancy and optimization cost, and offer actionable guidance for designing surrogates and preprocessing steps to accelerate solving embedded optimization problems. Collectively, the work informs better integration of ReLU surrogates in engineering MINLPs and suggests directions for extending these insights to more complex models and problem classes.

Abstract

Solving mixed-integer optimization problems with embedded neural networks with ReLU activation functions is challenging. Big-M coefficients that arise in relaxing binary decisions related to these functions grow exponentially with the number of layers. We survey and propose different approaches to analyze and improve the run time behavior of mixed-integer programming solvers in this context. Among them are clipped variants and regularization techniques applied during training as well as optimization-based bound tightening and a novel scaling for given ReLU networks. We numerically compare these approaches for three benchmark problems from the literature. We use the number of linear regions, the percentage of stable neurons, and overall computational effort as indicators. As a major takeaway we observe and quantify a trade-off between the often desired redundancy of neural network models versus the computational costs for solving related optimization problems.

An analysis of optimization problems involving ReLU neural networks

TL;DR

Embedding ReLU networks into MINLP introduces large big- constants that hinder solver performance. The paper surveys and quantifies strategies including bound tightening (IA and LP-based), a posteriori ReLU scaling, training-time regularization, clipped ReLU, and dropout, demonstrating that LP-based tightening and scaling can substantially reduce big- and improve runtimes, with regularization during training providing the strongest overall gains by reducing linear regions and increasing stability. The results show a practical trade-off between neural redundancy and optimization cost, and offer actionable guidance for designing surrogates and preprocessing steps to accelerate solving embedded optimization problems. Collectively, the work informs better integration of ReLU surrogates in engineering MINLPs and suggests directions for extending these insights to more complex models and problem classes.

Abstract

Solving mixed-integer optimization problems with embedded neural networks with ReLU activation functions is challenging. Big-M coefficients that arise in relaxing binary decisions related to these functions grow exponentially with the number of layers. We survey and propose different approaches to analyze and improve the run time behavior of mixed-integer programming solvers in this context. Among them are clipped variants and regularization techniques applied during training as well as optimization-based bound tightening and a novel scaling for given ReLU networks. We numerically compare these approaches for three benchmark problems from the literature. We use the number of linear regions, the percentage of stable neurons, and overall computational effort as indicators. As a major takeaway we observe and quantify a trade-off between the often desired redundancy of neural network models versus the computational costs for solving related optimization problems.

Paper Structure

This paper contains 20 sections, 18 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Equivalent scaling of ReLU ANNs. Scalar factor $c$ is multiplied row-wise to weight matrix and corresponding bias of current layer, resulting in a scaling of the output of the neuron by a factor of $c$. To compensate this, the weight matrix in the subsequent layer needs to be multiplied column-wise with the reciprocal of $c$.
  • Figure 2: Surface plots of the benchmark functions for surrogate model training and optimization.
  • Figure 3: Comparison of pre-activation bounds $U^{(k)}$ for functionally equivalent neural networks with ten hidden layers. The original bounds derived via interval arithmetic shown in \ref{['fig:IA_bounds']} are characterized by the typical exponential increase due to forward propagation of the input bounds. Solving auxiliary LPs yields tighter bounds, although the exponential increase is still present, as shown in \ref{['fig:LR_bounds']}. Comparable bounds can be computed via solving the scaling problem \ref{['prob:scaling']}, with the distinction that the bounds on the output of the network are equivalent to those derived from interval arithmetic. For the scaled neural network, solving the bound tightening problem \ref{['prob:obbt']} in addition yields even tighter bounds on the big-M coefficients in the hidden layers with ReLU activation, as can be seen in \ref{['fig:scaled_LR_bounds']}, while the output bounds are equivalent to those in \ref{['fig:LR_bounds']}.
  • Figure 4: Parity plots comparing percentage of stable neurons and computational times of optimally solved instances of \ref{['prob:minANN']} for bounds derived from IA and LP-based OBBT. Solving \ref{['prob:obbt']} leads to an increase of 5.5 percentage points in stable neurons on average. This carries over to a reduction in computational time shown in \ref{['subfig:time_IA_vs_OBBT']}. The ratios of times with OBBT and IA bounds have a geometric mean of 0.57.
  • Figure 5: Parity plots comparing computational times for optimally solved instances of \ref{['prob:minANN']} in different versions: \ref{['subfig:IA_vs_scaler']}: IA bounds for baseline vs. scaled ReLU network with a geometric mean ratio of 0.936; \ref{['subfig:IA_vs_scaler_and_OBBT']}: IA bounds for baseline network vs. OBBT bounds for scaled network with a geometric mean ratio of 0.467.
  • ...and 4 more figures