Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective

Zhaoxin Wang; Handing Wang; Cong Tian; Yaochu Jin

Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective

Zhaoxin Wang, Handing Wang, Cong Tian, Yaochu Jin

TL;DR

This work targets catastrophic overfitting in fast adversarial training by reframing FAT as a stabilizable bi-level process and introducing FGSM-PCO, which adaptively fuses historical and current adversarial examples and adds a tailored regularization term to prevent inner optimization collapse. The approach combines an adaptive fusion ratio driven by model confidence with a loss that enforces consistency between fused and individual perturbations, mitigating overfitting while preserving training efficiency. Empirical results across CIFAR-10/100 and Tiny-ImageNet on multiple architectures show FGSM-PCO achieves superior robustness against a suite of attacks, reduces overfitting incidents to near-zero, and improves clean accuracy relative to state-of-the-art FAT methods. The method offers a practical defense with broad applicability and points to further work in open-set scenarios and more efficient fusion strategies.

Abstract

Adversarial training (AT) has become an effective defense method against adversarial examples (AEs) and it is typically framed as a bi-level optimization problem. Among various AT methods, fast AT (FAT), which employs a single-step attack strategy to guide the training process, can achieve good robustness against adversarial attacks at a low cost. However, FAT methods suffer from the catastrophic overfitting problem, especially on complex tasks or with large-parameter models. In this work, we propose a FAT method termed FGSM-PCO, which mitigates catastrophic overfitting by averting the collapse of the inner optimization problem in the bi-level optimization process. FGSM-PCO generates current-stage AEs from the historical AEs and incorporates them into the training process using an adaptive mechanism. This mechanism determines an appropriate fusion ratio according to the performance of the AEs on the training model. Coupled with a loss function tailored to the training framework, FGSM-PCO can alleviate catastrophic overfitting and help the recovery of an overfitted model to effective training. We evaluate our algorithm across three models and three datasets to validate its effectiveness. Comparative empirical studies against other FAT algorithms demonstrate that our proposed method effectively addresses unresolved overfitting issues in existing algorithms.

Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective

TL;DR

Abstract

Paper Structure (25 sections, 7 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 25 sections, 7 equations, 6 figures, 5 tables, 1 algorithm.

Introduction
Related work
Fast Adversarial Training
Dilemma in Bi-Level Optimization Problems
Prior-Guided Fast Adversarial Training
Proposed Method
Procedure of Proposed Method
Adaptive Fusion Ratio
Regularization Loss
Mitigating Catastrophic Overfitting in FAT
Experimental Results
Experimental Settings
Datasets and Models
Compared Methods
Training Details
...and 10 more sections

Figures (6)

Figure 1: Catastrophic overfitting phenomenon in FAT. (a) is on the CIFAR10 dataset with a multi-step learning rate. (b) shows the overfitting on the Tiny-ImageNet dataset with a cyclic learning rate. Most FAT algorithms cannot prevent catastrophic overfitting.
Figure 2: Optimizing the inner and outer problems alternately is easy to cause the collapse of the bi-level optimization.
Figure 3: The performance of different FAT methods when the FGSM-AT and FGSM-MEP occur catastrophic overfitting.
Figure 4: The classification accuracy of PreActResNet18 with the Tiny-ImageNet dataset. The left figure shows the classification accuracy for AEs under PGD10 attack, and the right figure shows the accuracy for clean examples.
Figure 5: \ref{['attack_strength']} shows the classification accuracy under different attack strengths. \ref{['sensitive']} represents the sensitivity of $\beta$ on classification for adversarial and clean examples.
...and 1 more figures

Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective

TL;DR

Abstract

Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (6)