Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability
Martin Gubri, Maxime Cordy, Yves Le Traon
TL;DR
The paper investigates why early stopping boosts adversarial transferability, challenging the view that robust vs non-robust feature evolution drives this effect. By analyzing loss-landscape sharpness and training dynamics, it demonstrates that transferability peaks after learning-rate decays when sharpness drops, and that minimizing sharpness with SAM and its large-neighborhood variants (notably l-SAM) yields consistently stronger surrogate models. The results show that strong regularization from large flat neighborhoods improves transferability and can outperform early stopping while often sacrificing some natural accuracy, indicating a transferability-specific mechanism. The findings offer a practical pathway to improve transfer-based attacks by combining sharpness minimization with existing transferability techniques and provide open-source resources for replication.
Abstract
Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that early stopping the training of the surrogate model substantially increases transferability. A common hypothesis to explain this is that deep neural networks (DNNs) first learn robust features, which are more generic, thus a better surrogate. Then, at later epochs, DNNs learn non-robust features, which are more brittle, hence worst surrogate. First, we tend to refute this hypothesis, using transferability as a proxy for representation similarity. We then establish links between transferability and the exploration of the loss landscape in parameter space, focusing on sharpness, which is affected by early stopping. This leads us to evaluate surrogate models trained with seven minimizers that minimize both loss value and loss sharpness. Among them, SAM consistently outperforms early stopping by up to 28.8 percentage points. We discover that the strong SAM regularization from large flat neighborhoods tightly links to transferability. Finally, the best sharpness-aware minimizers prove competitive with other training methods and complement existing transferability techniques.
