Table of Contents
Fetching ...

Make Interval Bound Propagation great again

Patryk Krukowski, Daniel Wilczak, Jacek Tabor, Anna Bielawska, Przemysław Spurek

TL;DR

This paper demonstrates that IBP is sub-optimal in the first case due to its susceptibility to the wrapping effect, and adapt two classical approaches dedicated to strict computations -- Dubleton Arithmetic and Affine Arithmetic -- to mitigate the wrapping effect in neural networks.

Abstract

In various scenarios motivated by real life, such as medical data analysis, autonomous driving, and adversarial training, we are interested in robust deep networks. A network is robust when a relatively small perturbation of the input cannot lead to drastic changes in output (like change of class, etc.). This falls under the broader scope field of Neural Network Certification (NNC). Two crucial problems in NNC are of profound interest to the scientific community: how to calculate the robustness of a given pre-trained network and how to construct robust networks. The common approach to constructing robust networks is Interval Bound Propagation (IBP). This paper demonstrates that IBP is sub-optimal in the first case due to its susceptibility to the wrapping effect. Even for linear activation, IBP gives strongly sub-optimal bounds. Consequently, one should use strategies immune to the wrapping effect to obtain bounds close to optimal ones. We adapt two classical approaches dedicated to strict computations -- Dubleton Arithmetic and Affine Arithmetic -- to mitigate the wrapping effect in neural networks. These techniques yield precise results for networks with linear activation functions, thus resisting the wrapping effect. As a result, we achieve bounds significantly closer to the optimal level than IBPs.

Make Interval Bound Propagation great again

TL;DR

This paper demonstrates that IBP is sub-optimal in the first case due to its susceptibility to the wrapping effect, and adapt two classical approaches dedicated to strict computations -- Dubleton Arithmetic and Affine Arithmetic -- to mitigate the wrapping effect in neural networks.

Abstract

In various scenarios motivated by real life, such as medical data analysis, autonomous driving, and adversarial training, we are interested in robust deep networks. A network is robust when a relatively small perturbation of the input cannot lead to drastic changes in output (like change of class, etc.). This falls under the broader scope field of Neural Network Certification (NNC). Two crucial problems in NNC are of profound interest to the scientific community: how to calculate the robustness of a given pre-trained network and how to construct robust networks. The common approach to constructing robust networks is Interval Bound Propagation (IBP). This paper demonstrates that IBP is sub-optimal in the first case due to its susceptibility to the wrapping effect. Even for linear activation, IBP gives strongly sub-optimal bounds. Consequently, one should use strategies immune to the wrapping effect to obtain bounds close to optimal ones. We adapt two classical approaches dedicated to strict computations -- Dubleton Arithmetic and Affine Arithmetic -- to mitigate the wrapping effect in neural networks. These techniques yield precise results for networks with linear activation functions, thus resisting the wrapping effect. As a result, we achieve bounds significantly closer to the optimal level than IBPs.
Paper Structure (22 sections, 5 theorems, 54 equations, 7 figures)

This paper contains 22 sections, 5 theorems, 54 equations, 7 figures.

Key Result

Lemma 2.1

Let $V=(V_1,\ldots,V_n)$ be a random vector uniformly chosen from the unit sphere in $\mathbb{R}^n$. Let $R$ be a random variable given by $R=|V_1|+\ldots+|V_n|.$ Then

Figures (7)

  • Figure 1: The figure presents how the interval is propagated throughout linear layers. By red color we marked wrapping obtain by IBP and by green by Affine Arithmetic. As we can see, Affine Arithmetic produces significantly lower wrapping effects. In the case of linear transformations, Affine Arithmetic gives an exact approximation. We can work with more complex objects than hyper-cubes from IBP and obtain bounds close to optimal ones. In Fig. \ref{['fig:ex_1']} we present the procedure used in Affine Arithmetic to obtain $\mathrm{ReLU}(I^1)$.
  • Figure 2: Graphs of $\mathrm{ReLU}$ over a hyperplane crossing zero. (Left) first affine approximation of $\mathrm{ReLU}$ with $\tau=1$, that is $\widetilde{b}_0 + c\sum{a_it_i}$ and (right) its final affine approximation $b_0 + c\sum{a_it_i}$.
  • Figure 3: Affine Arithmetic works with more complicated shapes than hypercubes from IBP. In the example, we take Interval $I=[-1,1]^2$ and see how AA produces an approximation of output from the linear layer with ReLu activation. We use affine transformation from Fig. \ref{['fig:ex']}. To approximate AA output from $\mathrm{ReLU}(A(I))$, we first approximate nonlinear function $\mathrm{ReLU}(A(\cdot))$ by linear $B(\cdot)$ . Then, we propagate input interval $I$ through $B(\cdot)$. Then we add interval correction, which is equal to the maximal error between $\mathrm{ReLU}(A(I))$ and $B(\cdot)$ denoted by $b_{n+1}$. Finally, we obtain bound in Affine Arithmetic in the case of mapping interval through a linear layer with linear activation.
  • Figure 4: The average maximal diameter of the NN output measured for points near the classification boundary. The X axis represents the perturbation size applied to the data points, while the Y axis shows the average maximal diameter of the NN output in the logarithmic scale. As we can see, the AA and DA methods give better approximation of interval bounds than the IBP method. Note that the DA cannot be calculated for large CNN architectures according to CPU constraints. We can see that IBP training in relation to standard training allows to reduce wrapping effect.
  • Figure 5: The average maximal diameter of the NN output measured for points near the classification boundary for the medium and large CNN architectures. The X axis represents the perturbation size applied to the data points, while the Y axis shows the average maximal diameter of the NN output in the logarithmic scale.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Lemma 2.1
  • Proposition 2.1
  • proof
  • Theorem 2.1
  • proof
  • Theorem 3.1
  • Lemma A.1
  • proof