Table of Contents
Fetching ...

Comment on "Biologically inspired protection of deep networks from adversarial attacks"

Wieland Brendel, Matthias Bethge

TL;DR

This work analyzes saturated networks and shows that the attacks fail due to numerical limitations in the gradient computations, and suggests a simple stabilisation of the gradient estimates enables successful and efficient attacks.

Abstract

A recent paper suggests that Deep Neural Networks can be protected from gradient-based adversarial perturbations by driving the network activations into a highly saturated regime. Here we analyse such saturated networks and show that the attacks fail due to numerical limitations in the gradient computations. A simple stabilisation of the gradient estimates enables successful and efficient attacks. Thus, it has yet to be shown that the robustness observed in highly saturated networks is not simply due to numerical limitations.

Comment on "Biologically inspired protection of deep networks from adversarial attacks"

TL;DR

This work analyzes saturated networks and shows that the attacks fail due to numerical limitations in the gradient computations, and suggests a simple stabilisation of the gradient estimates enables successful and efficient attacks.

Abstract

A recent paper suggests that Deep Neural Networks can be protected from gradient-based adversarial perturbations by driving the network activations into a highly saturated regime. Here we analyse such saturated networks and show that the attacks fail due to numerical limitations in the gradient computations. A simple stabilisation of the gradient estimates enables successful and efficient attacks. Thus, it has yet to be shown that the robustness observed in highly saturated networks is not simply due to numerical limitations.

Paper Structure

This paper contains 3 figures, 1 table.

Figures (3)

  • Figure 1: Histogram over the elements of the gradients of the input image with respect to the cross-entropy loss (the direction of the adversarial perturbation) for both the vanilla sigmoid MLP (left) and the saturated sigmoid MLP (right). In the saturated network more then 98% of the gradient elements are exactly zero while the rest is sixteen orders of magnitude smaller then in the vanilla network.
  • Figure 2: The success of the FGSM attack clearly reflects the ratio of non-zero gradients. Networks with different gain are only used to generate adversarial images using the FGSM method. The accuracy by which the adversarials fool the saturated network (gain = 1) is plotted in red. The success of FGSM is highly correlated with the ratio of non-zero gradients (black).
  • Figure S1: (a) Sigmoid MLP Weight and activation distribution for both the vanilla (top) and saturated (bottom) network. We observe a qualitatively similar increase in the kurtosis of the weights and the bimodality of the activations as in 1703.09202. (b) ReLU MLP Same as (a) but for ReLU nonlinearities. Similar to 1703.09202 the activations are not bimodal as in the sigmoid MLP but feature a high kurtosis.