Table of Contents
Fetching ...

Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability

Martin Gubri, Maxime Cordy, Yves Le Traon

TL;DR

The paper investigates why early stopping boosts adversarial transferability, challenging the view that robust vs non-robust feature evolution drives this effect. By analyzing loss-landscape sharpness and training dynamics, it demonstrates that transferability peaks after learning-rate decays when sharpness drops, and that minimizing sharpness with SAM and its large-neighborhood variants (notably l-SAM) yields consistently stronger surrogate models. The results show that strong regularization from large flat neighborhoods improves transferability and can outperform early stopping while often sacrificing some natural accuracy, indicating a transferability-specific mechanism. The findings offer a practical pathway to improve transfer-based attacks by combining sharpness minimization with existing transferability techniques and provide open-source resources for replication.

Abstract

Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that early stopping the training of the surrogate model substantially increases transferability. A common hypothesis to explain this is that deep neural networks (DNNs) first learn robust features, which are more generic, thus a better surrogate. Then, at later epochs, DNNs learn non-robust features, which are more brittle, hence worst surrogate. First, we tend to refute this hypothesis, using transferability as a proxy for representation similarity. We then establish links between transferability and the exploration of the loss landscape in parameter space, focusing on sharpness, which is affected by early stopping. This leads us to evaluate surrogate models trained with seven minimizers that minimize both loss value and loss sharpness. Among them, SAM consistently outperforms early stopping by up to 28.8 percentage points. We discover that the strong SAM regularization from large flat neighborhoods tightly links to transferability. Finally, the best sharpness-aware minimizers prove competitive with other training methods and complement existing transferability techniques.

Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability

TL;DR

The paper investigates why early stopping boosts adversarial transferability, challenging the view that robust vs non-robust feature evolution drives this effect. By analyzing loss-landscape sharpness and training dynamics, it demonstrates that transferability peaks after learning-rate decays when sharpness drops, and that minimizing sharpness with SAM and its large-neighborhood variants (notably l-SAM) yields consistently stronger surrogate models. The results show that strong regularization from large flat neighborhoods improves transferability and can outperform early stopping while often sacrificing some natural accuracy, indicating a transferability-specific mechanism. The findings offer a practical pathway to improve transfer-based attacks by combining sharpness minimization with existing transferability techniques and provide open-source resources for replication.

Abstract

Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that early stopping the training of the surrogate model substantially increases transferability. A common hypothesis to explain this is that deep neural networks (DNNs) first learn robust features, which are more generic, thus a better surrogate. Then, at later epochs, DNNs learn non-robust features, which are more brittle, hence worst surrogate. First, we tend to refute this hypothesis, using transferability as a proxy for representation similarity. We then establish links between transferability and the exploration of the loss landscape in parameter space, focusing on sharpness, which is affected by early stopping. This leads us to evaluate surrogate models trained with seven minimizers that minimize both loss value and loss sharpness. Among them, SAM consistently outperforms early stopping by up to 28.8 percentage points. We discover that the strong SAM regularization from large flat neighborhoods tightly links to transferability. Finally, the best sharpness-aware minimizers prove competitive with other training methods and complement existing transferability techniques.
Paper Structure (46 sections, 17 figures, 16 tables)

This paper contains 46 sections, 17 figures, 16 tables.

Figures (17)

  • Figure 1: Illustration of the loss landscape, showing the training of surrogate models to craft transferable adversarial examples. Before the learning rate decays, training tends to "cross the valley" with plateauing transferability. A few iterations after the decay of the learning rate, early stopped SGD achieves its best transferability (gray). In the following epochs, SGD falls progressively into deep, sharp holes in the parameter space with poor transferability (red). l-SAM (blue) avoids these holes by minimizing the maximum loss around an unusually large neighborhood (thick blue arrow).
  • Figure 2: Early stopping improves the transferability from surrogate models trained on both robust and non-robust datasets. Average success rate evaluated over ten target models trained on the original CIFAR-10 dataset, from a ResNet-50 surrogate model trained for a number of epochs (x-axis) on the datasets $D_R$ (blue) and $D_\text{NR}$ (green) of Ilyas2019AdversarialFeatures modified from CIFAR-10 (red). We craft all adversarial examples from the same subset of the original CIFAR-10 test set. Average (line) and confidence interval of $\pm$ two standard deviations (colored area) of three training runs. Appendix \ref{['sec:app-rfs-nrfs']} contains the details per target.
  • Figure 3: Early stopping improves the transferability to target models trained on both robust and non-robust datasets. Success rate from a ResNet-50 trained for a number of epochs (x-axis) on the original CIFAR-10 dataset, to ResNet-50 targets trained on the robust dataset $D_R$ (red), and the three non-robust datasets $D_\text{NR}$ (green), $D_\text{rand}$ (blue) and $D_\text{det}$ (purple) of Ilyas2019AdversarialFeatures modified from CIFAR-10. The perturbation norm $\varepsilon$ is $\frac{16}{255}$ for the $D_R$ target, $\frac{2}{255}$ for the $D_{NR}$ target and $\frac{1}{255}$ for the $D_\text{rand}$ and $D_\text{det}$ targets to adapt to the vulnerability of target models (the order of lines cannot be compared). Average (line) and confidence interval of $\pm$ two standard deviations (colored area) of three training runs.
  • Figure 4: Transferability peaks when the learning rate decays at any epochs. Average success rate evaluated over ten target models from a ResNet-50 surrogate model trained for a number of epochs (x-axis) on CIFAR-10. The learning rate is divided by 10 once during training at the epoch corresponding to the color. Red is our standard schedule, with two decays at epochs 50 and 100. Pink is the baseline of constant learning rate. Best seen in colors.
  • Figure 5: Sharpness drops when the learning rate decays. Largest eigenvalue of the Hessian (red) and trace of the Hessian (blue) for all training epochs (x-axis) on CIFAR-10. Average success rate on ten targets (orange, right axis). Average (line) and confidence interval of $\pm$ two standard deviations (colored area) of three training runs. Vertical bars indicate the learning rate step decays. Best seen in colors.
  • ...and 12 more figures