Table of Contents
Fetching ...

Provable Robustness of ReLU networks via Maximization of Linear Regions

Francesco Croce, Maksym Andriushchenko, Matthias Hein

TL;DR

The authors tackle provable robustness for ReLU networks by exploiting the piecewise affine structure to define linear regions and decision boundaries. They derive robustness guarantees using distances to region and decision boundaries, and introduce the Maximum Margin Regularizer (MMR) to systematically enlarge linear regions and margins during training. Empirically, MMR improves both lower and upper robustness bounds and enhances verifiability via faster MIP certification, often matching or surpassing adversarial training baselines. The work also enables obtaining guaranteed optimal adversarial perturbations for a substantial fraction of inputs, demonstrating practical impact for certifiable robustness in safety-critical settings.

Abstract

It has been shown that neural network classifiers are not robust. This raises concerns about their usage in safety-critical systems. We propose in this paper a regularization scheme for ReLU networks which provably improves the robustness of the classifier by maximizing the linear regions of the classifier as well as the distance to the decision boundary. Our techniques allow even to find the minimal adversarial perturbation for a fraction of test points for large networks. In the experiments we show that our approach improves upon adversarial training both in terms of lower and upper bounds on the robustness and is comparable or better than the state-of-the-art in terms of test error and robustness.

Provable Robustness of ReLU networks via Maximization of Linear Regions

TL;DR

The authors tackle provable robustness for ReLU networks by exploiting the piecewise affine structure to define linear regions and decision boundaries. They derive robustness guarantees using distances to region and decision boundaries, and introduce the Maximum Margin Regularizer (MMR) to systematically enlarge linear regions and margins during training. Empirically, MMR improves both lower and upper robustness bounds and enhances verifiability via faster MIP certification, often matching or surpassing adversarial training baselines. The work also enables obtaining guaranteed optimal adversarial perturbations for a substantial fraction of inputs, demonstrating practical impact for certifiable robustness in safety-critical settings.

Abstract

It has been shown that neural network classifiers are not robust. This raises concerns about their usage in safety-critical systems. We propose in this paper a regularization scheme for ReLU networks which provably improves the robustness of the classifier by maximizing the linear regions of the classifier as well as the distance to the decision boundary. Our techniques allow even to find the minimal adversarial perturbation for a fraction of test points for large networks. In the experiments we show that our approach improves upon adversarial training both in terms of lower and upper bounds on the robustness and is comparable or better than the state-of-the-art in terms of test error and robustness.

Paper Structure

This paper contains 15 sections, 3 theorems, 20 equations, 10 figures, 6 tables.

Key Result

Lemma 3.1

The $l_p$-distance $d_B(x)=\mathop{\rm min}\nolimits_{z \in \partial Q(x)} \left\|z-x\right\|_p$ of $x$ to the boundary of the polytope $Q(x)$ is given by where $V_j^{(l)}$ is the $j$-th row of $V^{(l)}$ and $\left\|\cdot\right\|_q$ is the dual norm of $\left\|\cdot\right\|_p$ ($\frac{1}{p}+\frac{1}{q}=1$).

Figures (10)

  • Figure 1: Left: the input $x$ is closer to the boundary of the polytope $Q(x)$ (black) than to the decision boundary (red). In this case the smallest perturbation that leads to a change of the decision lies outside the linear region $Q(x)$. Right: the input $x$ is closer to the decision boundary than to the boundary of $Q(x)$, so that the projection of the point onto the decision hyperplane provides the adversarial example with the smallest norm.
  • Figure 2: The effects of MMR. Top row: we train two networks with one hidden layer, 100 units, on 128 points belonging to two classes (red and blue). Figure \ref{['fig:te_orig']} shows the points and how the input space is divided in regions on which the classifier is linear. Figure \ref{['fig:te_reg']} is the analogue for our MMR regularized model. Bottom row: we show region boundaries (one hidden layer, 1024 units) on a 2D slice of $\mathbb{R}^{784}$ spanned by three random points from different classes of the MNIST training set. We observe a clear maximization of the linear regions for the MMR-regularized case (Figure \ref{['fig:margin_reg']}) versus the non-regularized case (Figure \ref{['fig:margin_orig']}).
  • Figure 3: Verifiability of models. We show the runtime (left) in minutes that MIP TjeTed2017 takes to verify 1000 points, setting a timeout of 120s (that is the mixed-integer optimization stops anyway after the time limit is reached), with models trained with different values of $\lambda$ (see Equation \ref{['eq:obj']}). Note that a logarithmic scale is used on the $y$-axis. Moreover, we report (right) lower (red) and upper (green) bounds on robust test error. The plain model, trained without MMR ($\lambda=0$) needs 51 times more to be verified, with only 1% of the points certified. Conversely, even with a light MMR regularization lower and upper bounds are tight.
  • Figure 4: We report the descending sorted ratios $\left\|\delta_{CW}\right\|_2$/$\left\|\delta_{opt}\right\|_2$ (norm of the outcomes of CW-attack divided by the norm of minimal adversarial examples) with regard to a FC1 model on GTS dataset trained with our regularizer.
  • Figure 5: Gradient of $l_\infty$-robust models. We visualize the gradients of the cross entropy loss wrt the input for different images of MNIST test set for every model: plain training, adversarial training MadEtAl2018, KW robust training WonKol2018, MMR, MMR + adversarial training. We can see for robust models the gradients are much sparser, while only for plain training it does not clearly highlights relevant features.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Definition 2.1
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Theorem 3.1
  • proof
  • Definition 4.1