Table of Contents
Fetching ...

On the Robustness of Neural Collapse and the Neural Collapse of Robustness

Jingtong Su, Ya Shi Zhang, Nikolaos Tsilivis, Julia Kempe

TL;DR

It is found that the simplex structure disappears under small adversarial attacks, and that perturbed examples"leap"between simplex vertices, and that perturbed examples"leap"between simplex vertices.

Abstract

Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex). While it has been observed empirically in various cases and has been theoretically motivated, its connection with crucial properties of neural networks, like their generalization and robustness, remains unclear. In this work, we study the stability properties of these simplices. We find that the simplex structure disappears under small adversarial attacks, and that perturbed examples "leap" between simplex vertices. We further analyze the geometry of networks that are optimized to be robust against adversarial perturbations of the input, and find that Neural Collapse is a pervasive phenomenon in these cases as well, with clean and perturbed representations forming aligned simplices, and giving rise to a robust simple nearest-neighbor classifier. By studying the propagation of the amount of collapse inside the network, we identify novel properties of both robust and non-robust machine learning models, and show that earlier, unlike later layers maintain reliable simplices on perturbed data. Our code is available at https://github.com/JingtongSu/robust_neural_collapse .

On the Robustness of Neural Collapse and the Neural Collapse of Robustness

TL;DR

It is found that the simplex structure disappears under small adversarial attacks, and that perturbed examples"leap"between simplex vertices, and that perturbed examples"leap"between simplex vertices.

Abstract

Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex). While it has been observed empirically in various cases and has been theoretically motivated, its connection with crucial properties of neural networks, like their generalization and robustness, remains unclear. In this work, we study the stability properties of these simplices. We find that the simplex structure disappears under small adversarial attacks, and that perturbed examples "leap" between simplex vertices. We further analyze the geometry of networks that are optimized to be robust against adversarial perturbations of the input, and find that Neural Collapse is a pervasive phenomenon in these cases as well, with clean and perturbed representations forming aligned simplices, and giving rise to a robust simple nearest-neighbor classifier. By studying the propagation of the amount of collapse inside the network, we identify novel properties of both robust and non-robust machine learning models, and show that earlier, unlike later layers maintain reliable simplices on perturbed data. Our code is available at https://github.com/JingtongSu/robust_neural_collapse .
Paper Structure (24 sections, 23 equations, 27 figures)

This paper contains 24 sections, 23 equations, 27 figures.

Figures (27)

  • Figure 1: Visualisation of our findings. Sticks represent clean class-means. Small dots correspond to the representation of an individual datum. The color represents the ground-truth label, and the dotted lines represent the predicted class-means. Left to Right: clean representations with standardly-trained (ST) networks; perturbed representations with standardly-trained networks; clean representations with adversarially-trained (AT) networks; perturbed representations with adversarially-trained networks. With ST nets, the adversarial perturbations push the representation to "leap" towards another cluster with slight angular deviation. AT makes the simplex resilient to such adversarial attacks, with higher and intra-class variance.
  • Figure 2: Accuracy, Loss, and NC evolution for standardly (ST) and adversarially (AT) trained VGG and ResNet. For AT models, clean and Guassian curves coincide. Setting: CIFAR-10, $\ell_\infty$ adversary.
  • Figure 3: Illustration of untargeted adversarial attacks on standardly trained, converged, models that correspond to one random seed. (CIFAR-10, $\ell_\infty$). Left: Number of examples with a certain predicted label. Inner Left: The norms of clean class-means. Inner Right: The norms of predicted class-means with perturbed data. Right: Angular distance between clean and predicted class-mean with perturbed data. Upper: ResNet18; Lower: VGG11. For 10 classes, the between-class angular distance is $\arccos{(-\frac{1}{9})}=1.68$ rad $=96.38$ degrees, while 0.2 rad is only 11.4 degrees.
  • Figure 4: Angular distance. Left and Inner Left: Average between targeted attack class-means and clean class-means on ST network. Inner Right and Right: Average between perturbed class-means and clean class-means on AT network. Setting: CIFAR-10, $\ell_\infty$ adversary.
  • Figure 5: Accuracy, Loss and NC evolution with TRADES trained networks. Upper: ResNet18; Lower: VGG11. No simplices are formed with TRADES training. Setting: CIFAR-10, $\ell_\infty$ adversary. Note that we plot the KLD-loss here to showcase optimization convergence, to avoid the effect of the regularization constant $\beta$.
  • ...and 22 more figures