Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
Zhaoxin Wang, Handing Wang, Cong Tian, Yaochu Jin
TL;DR
This work targets catastrophic overfitting in fast adversarial training by reframing FAT as a stabilizable bi-level process and introducing FGSM-PCO, which adaptively fuses historical and current adversarial examples and adds a tailored regularization term to prevent inner optimization collapse. The approach combines an adaptive fusion ratio driven by model confidence with a loss that enforces consistency between fused and individual perturbations, mitigating overfitting while preserving training efficiency. Empirical results across CIFAR-10/100 and Tiny-ImageNet on multiple architectures show FGSM-PCO achieves superior robustness against a suite of attacks, reduces overfitting incidents to near-zero, and improves clean accuracy relative to state-of-the-art FAT methods. The method offers a practical defense with broad applicability and points to further work in open-set scenarios and more efficient fusion strategies.
Abstract
Adversarial training (AT) has become an effective defense method against adversarial examples (AEs) and it is typically framed as a bi-level optimization problem. Among various AT methods, fast AT (FAT), which employs a single-step attack strategy to guide the training process, can achieve good robustness against adversarial attacks at a low cost. However, FAT methods suffer from the catastrophic overfitting problem, especially on complex tasks or with large-parameter models. In this work, we propose a FAT method termed FGSM-PCO, which mitigates catastrophic overfitting by averting the collapse of the inner optimization problem in the bi-level optimization process. FGSM-PCO generates current-stage AEs from the historical AEs and incorporates them into the training process using an adaptive mechanism. This mechanism determines an appropriate fusion ratio according to the performance of the AEs on the training model. Coupled with a loss function tailored to the training framework, FGSM-PCO can alleviate catastrophic overfitting and help the recovery of an overfitted model to effective training. We evaluate our algorithm across three models and three datasets to validate its effectiveness. Comparative empirical studies against other FAT algorithms demonstrate that our proposed method effectively addresses unresolved overfitting issues in existing algorithms.
