Table of Contents
Fetching ...

Tightening convex relaxations of trained neural networks: a unified approach for convex and S-shaped activations

Pablo Carrasco, Gonzalo Muñoz

TL;DR

A recursive formula is developed that yields a tight convexification for the composition of an activation with an affine function for a wide scope of activation functions, namely, convex or ``S-shaped" for a wide scope of activation functions.

Abstract

The non-convex nature of trained neural networks has created significant obstacles in their incorporation into optimization models. Considering the wide array of applications that this embedding has, the optimization and deep learning communities have dedicated significant efforts to the convexification of trained neural networks. Many approaches to date have considered obtaining convex relaxations for each non-linear activation in isolation, which poses limitations in the tightness of the relaxations. Anderson et al. (2020) strengthened these relaxations and provided a framework to obtain the convex hull of the graph of a piecewise linear convex activation composed with an affine function; this effectively convexifies activations such as the ReLU together with the affine transformation that precedes it. In this article, we contribute to this line of work by developing a recursive formula that yields a tight convexification for the composition of an activation with an affine function for a wide scope of activation functions, namely, convex or ``S-shaped". Our approach can be used to efficiently compute separating hyperplanes or determine that none exists in various settings, including non-polyhedral cases. We provide computational experiments to test the empirical benefits of these convex approximations.

Tightening convex relaxations of trained neural networks: a unified approach for convex and S-shaped activations

TL;DR

A recursive formula is developed that yields a tight convexification for the composition of an activation with an affine function for a wide scope of activation functions, namely, convex or ``S-shaped" for a wide scope of activation functions.

Abstract

The non-convex nature of trained neural networks has created significant obstacles in their incorporation into optimization models. Considering the wide array of applications that this embedding has, the optimization and deep learning communities have dedicated significant efforts to the convexification of trained neural networks. Many approaches to date have considered obtaining convex relaxations for each non-linear activation in isolation, which poses limitations in the tightness of the relaxations. Anderson et al. (2020) strengthened these relaxations and provided a framework to obtain the convex hull of the graph of a piecewise linear convex activation composed with an affine function; this effectively convexifies activations such as the ReLU together with the affine transformation that precedes it. In this article, we contribute to this line of work by developing a recursive formula that yields a tight convexification for the composition of an activation with an affine function for a wide scope of activation functions, namely, convex or ``S-shaped". Our approach can be used to efficiently compute separating hyperplanes or determine that none exists in various settings, including non-polyhedral cases. We provide computational experiments to test the empirical benefits of these convex approximations.

Paper Structure

This paper contains 41 sections, 15 theorems, 75 equations, 6 figures, 2 tables.

Key Result

Theorem 1

Consider $w\in\mathbb{R}^n_{+}$, $b\in \mathbb{R}$ and let $f:[0,1]^n \to \mathbb{R}$ be a function of the form $f(x) = \sigma(w^\top x + b)$ where $\sigma$ satisfies the STFE property. Then, where $R_f,R_l$ and $R_i$ are defined using Definition def:regions with $\hat{z}$ as the tie point of $\sigma$ in $[b,w^\top \mathbf{1} + b]$.

Figures (6)

  • Figure 1: Plot of the SELU activation (blue) overlapped with its concave envelope (orange) in the interval $[-1.13, 0.5]$. See Table \ref{['table:Sshaped']} for the definition of this activation. In this interval, the concave envelope is a linear function, which is equal to $\sigma(x)$ for every $x\geq 0$; therefore there are multiple $\hat{z}$ for accommodating property \ref{['eq:stfe']}. According to our definition, the tie point would be unambiguously defined as 0.
  • Figure 2: Plots of the function $f(x) = \sigma(w^\top x + b)$ defined with the parameters of Example \ref{['ex:main']}, along with the slices given by setting the variables to 1.
  • Figure 3: Plots of the concave envelope of the function $f(x) = \sigma(w^\top x + b)$ defined with the parameters of Example \ref{['ex:main']}, along with the slices given by setting the variables to 1.
  • Figure 4: Plots of the function $f(x) = \sigma(w^\top x + b)$ defined with the parameters of Example \ref{['ex:main']}, the concave overestimator $h(x)$ given in Lemma \ref{['lemma:mccormick']}, and the slices given by setting the variables to 1.
  • Figure 5: Illustration of why taking the perspective over a complete face of $\hbox{conc}(f,[0,1]^{n})$ does not yield a valid overestimator. The function $f(x)$ of Example \ref{['ex:main']} is plotted in orange. The black line is $w^\top x + b = \hat{z}$. The blue ray represents taking the perspective onto a point that lies on $R_f$, which goes "into" the graph of the function $f(x)$.
  • ...and 1 more figures

Theorems & Definitions (33)

  • Definition 1
  • Remark 1
  • Definition 2
  • Remark 2
  • Definition 3
  • Theorem 1
  • Example 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • ...and 23 more