Efficient local linearity regularization to overcome catastrophic overfitting

Elias Abad Rocamora; Fanghui Liu; Grigorios G. Chrysos; Pablo M. Olmos; Volkan Cevher

Efficient local linearity regularization to overcome catastrophic overfitting

Elias Abad Rocamora, Fanghui Liu, Grigorios G. Chrysos, Pablo M. Olmos, Volkan Cevher

TL;DR

This work tackles catastrophic overfitting in single-step adversarial training by introducing Efficient Local Linearity Enforcement (ELLE), a plug-in regularization that promotes local linearity of the loss with respect to input perturbations without Double Backpropagation. The authors establish a theoretical link between the local linear approximation error $E_{ ext{Lin}}$ and loss curvature, enabling CO detection and control, and propose an adaptive variant ELLE-A that tunes the regularization strength during training. Empirical results across CIFAR-10/100, SVHN, and ImageNet show that ELLE(-A) achieves state-of-the-art robustness among single-step methods, with substantial speedups over prior local-linearity approaches. The method is compatible with existing single-step defenses (e.g., N-FGSM, GAT) and particularly improves performance at large perturbation budgets, making robust training more accessible and scalable.

Abstract

Catastrophic overfitting (CO) in single-step adversarial training (AT) results in abrupt drops in the adversarial test accuracy (even down to 0%). For models trained with multi-step AT, it has been observed that the loss function behaves locally linearly with respect to the input, this is however lost in single-step AT. To address CO in single-step AT, several methods have been proposed to enforce local linearity of the loss via regularization. However, these regularization terms considerably slow down training due to Double Backpropagation. Instead, in this work, we introduce a regularization term, called ELLE, to mitigate CO effectively and efficiently in classical AT evaluations, as well as some more difficult regimes, e.g., large adversarial perturbations and long training schedules. Our regularization term can be theoretically linked to curvature of the loss function and is computationally cheaper than previous methods by avoiding Double Backpropagation. Our thorough experimental validation demonstrates that our work does not suffer from CO, even in challenging settings where previous works suffer from it. We also notice that adapting our regularization parameter during training (ELLE-A) greatly improves the performance, specially in large $ε$ setups. Our implementation is available in https://github.com/LIONS-EPFL/ELLE .

Efficient local linearity regularization to overcome catastrophic overfitting

TL;DR

and loss curvature, enabling CO detection and control, and propose an adaptive variant ELLE-A that tunes the regularization strength during training. Empirical results across CIFAR-10/100, SVHN, and ImageNet show that ELLE(-A) achieves state-of-the-art robustness among single-step methods, with substantial speedups over prior local-linearity approaches. The method is compatible with existing single-step defenses (e.g., N-FGSM, GAT) and particularly improves performance at large perturbation budgets, making robust training more accessible and scalable.

Abstract

setups. Our implementation is available in https://github.com/LIONS-EPFL/ELLE .

Paper Structure (33 sections, 3 theorems, 27 equations, 20 figures, 3 tables, 1 algorithm)

This paper contains 33 sections, 3 theorems, 27 equations, 20 figures, 3 tables, 1 algorithm.

Introduction
Background and related work
Computationally Efficient Robust Training
Catastrophic Overfitting (CO)
Method
Algorithm description
Theoretical understanding
Comparison with explicit local-linearity enforcing algorithms
Experiments
Experimental setup
Local, linear approximation error detects and controls CO
Comparison against methods enforcing local linearity
Comparison against single-step methods
Regularizing other single-step methods
ImageNet results
...and 18 more sections

Key Result

Proposition 1

Let $\bm{h} \in C^{3}(\mathbb{R}^{d})$ be a three times differentiable mapping. Let $E_{\text{Lin}}(\bm{h},\bm{x},p,\epsilon)$ and $D_{\bm{v}}^2(h_i(\bm{x}))$ be defined as in def:lin_approx_errordef:second_directional_derivative and $\left[D_{\bm{v}}^2(h_i(\bm{x}))\right]_{i=1}^{o} \in \mathbb{R}^{ where $\bm{x}_c := (1-\alpha)\cdot\bm{x}_a + \alpha \cdot \bm{x}_b$.

Figures (20)

Figure 1: Comparison against single-step methods enforcing local linearity. We train with our method ELLE and its adaptive regularization variant ELLE-A. The multi-step AT PGD-10 results are included for comparison. We measure (a) the average total runtime and FGSM training as the fast but $0\%$ AA accuracy baseline. (b) The clean and AutoAttack accuracies. We mark the best method and the runner-up in bold and underlined respectively. Our methods, ELLE and ELLE-A, attain the best or comparable AA accuracy while employing less than $50\%$ of the time of previous methods.
Figure 2: Effectiveness of our local-linearity metric for detecting and controlling CO. We train with AT PGD-10, single step FGSM attacks and our method without ( ELLE) and with ( ELLE-A) adapting $\lambda$ at $\epsilon = 8/255$ in CIFAR10. We track: (a) the clean and PGD-20 test accuracies, (b) the GradAlign regularization term and (c) our regularization term. AT PGD-10 is able to produce locally linear models, see (b), (c). Our regularization term accurately detects when CO appears and when regularized, is able to avoid CO. ELLE-A is able to attain a higher robustness than ELLE.
Figure 3: Catastrophic Overfitting in the Short schedule: Comparison of our method against single-step methods and AT PGD-10 on (a) SVHN, (b) CIFAR10 and (c) CIFAR100. ELLE-A and N-FGSM+ ELLE-A are the only single-step methods avoiding CO while attaining high performance for all $\epsilon$ and datasets.
Figure 4: Catastrophic Overfitting in the Long schedule: We report the PGD-20 adversarial accuracy for the PRN architecture in (a) SVHN and (b) CIFAR10. In (c) we report the PGD-20 adversarial accuracy for the WRN architecture trained in CIFAR10 with the Long schedule. N-FGSM suffers from CO in both SVHN and CIFAR10 datasets for $\epsilon > 6/255$ and $\epsilon > 16/255$ respectively. ELLE remains resistant to CO in all setups.
Figure 5: Combining N-FGSM and GAT with ELLE-A:(a) AutoAttack (AA) and Clean accuracy for PRN and WRN trained with $\epsilon \in \{16,26\}/255$. (b) Evolution of PGD-20 test accuracy during training of WRN. ELLE-A helps GAT and N-FGSM overcome CO and improve their performance.
...and 15 more figures

Theorems & Definitions (11)

Definition 1: local, linear approximation error
Definition 2: Second order directional derivative $D_{\bm{v}}^2(h_i(\bm{x}))$
Proposition 1
Remark 1
Example 1: Non locally linear function with $\hat{E}_{\text{lin}} = 0$
proof : Proof of \ref{['prop:rel_lin_error_second_derivative']}
Definition 3: $5$-point local, linear approximation error
Proposition 2
proof : Proof of \ref{['prop:rel_5p_lin_error_second_derivative']}
Proposition 3
...and 1 more

Efficient local linearity regularization to overcome catastrophic overfitting

TL;DR

Abstract

Efficient local linearity regularization to overcome catastrophic overfitting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (11)