Table of Contents
Fetching ...

Understanding Robust Overfitting from the Feature Generalization Perspective

Chaojian Yu, Xiaolong Shi, Jun Yu, Bo Han, Tongliang Liu

TL;DR

This work addresses robust overfitting (RO) in adversarial training (AT) by examining RO through a feature generalization lens. It formalizes AT as a minimax objective min_theta (1/n) sum_i max_{delta_i in $\Delta$} ell(f_theta(x_i+delta_i), y_i) with perturbation budget $\epsilon$, and demonstrates that RO is induced by natural data via factor ablation; adding perturbations further degrades feature generalization. It proposes two mitigation methods, ROFG_AS and ROFG_DA: ROFG_AS adjusts attack strength on small-loss data with budgets $\epsilon_a$, while ROFG_DA uses iterative data augmentation (AugMix) to narrow the training-test robustness gap, both mitigating RO and improving robustness across AT variants. Across CIFAR-10/100 and multiple architectures, these approaches validate the feature_generalization perspective and offer practical routes to reduce RO without requiring extra data, contributing a new lens on RO and actionable defenses.

Abstract

Adversarial training (AT) constructs robust neural networks by incorporating adversarial perturbations into natural data. However, it is plagued by the issue of robust overfitting (RO), which severely damages the model's robustness. In this paper, we investigate RO from a novel feature generalization perspective. Specifically, we design factor ablation experiments to assess the respective impacts of natural data and adversarial perturbations on RO, identifying that the inducing factor of RO stems from natural data. Given that the only difference between adversarial and natural training lies in the inclusion of adversarial perturbations, we further hypothesize that adversarial perturbations degrade the generalization of features in natural data and verify this hypothesis through extensive experiments. Based on these findings, we provide a holistic view of RO from the feature generalization perspective and explain various empirical behaviors associated with RO. To examine our feature generalization perspective, we devise two representative methods, attack strength and data augmentation, to prevent the feature generalization degradation during AT. Extensive experiments conducted on benchmark datasets demonstrate that the proposed methods can effectively mitigate RO and enhance adversarial robustness.

Understanding Robust Overfitting from the Feature Generalization Perspective

TL;DR

This work addresses robust overfitting (RO) in adversarial training (AT) by examining RO through a feature generalization lens. It formalizes AT as a minimax objective min_theta (1/n) sum_i max_{delta_i in } ell(f_theta(x_i+delta_i), y_i) with perturbation budget , and demonstrates that RO is induced by natural data via factor ablation; adding perturbations further degrades feature generalization. It proposes two mitigation methods, ROFG_AS and ROFG_DA: ROFG_AS adjusts attack strength on small-loss data with budgets , while ROFG_DA uses iterative data augmentation (AugMix) to narrow the training-test robustness gap, both mitigating RO and improving robustness across AT variants. Across CIFAR-10/100 and multiple architectures, these approaches validate the feature_generalization perspective and offer practical routes to reduce RO without requiring extra data, contributing a new lens on RO and actionable defenses.

Abstract

Adversarial training (AT) constructs robust neural networks by incorporating adversarial perturbations into natural data. However, it is plagued by the issue of robust overfitting (RO), which severely damages the model's robustness. In this paper, we investigate RO from a novel feature generalization perspective. Specifically, we design factor ablation experiments to assess the respective impacts of natural data and adversarial perturbations on RO, identifying that the inducing factor of RO stems from natural data. Given that the only difference between adversarial and natural training lies in the inclusion of adversarial perturbations, we further hypothesize that adversarial perturbations degrade the generalization of features in natural data and verify this hypothesis through extensive experiments. Based on these findings, we provide a holistic view of RO from the feature generalization perspective and explain various empirical behaviors associated with RO. To examine our feature generalization perspective, we devise two representative methods, attack strength and data augmentation, to prevent the feature generalization degradation during AT. Extensive experiments conducted on benchmark datasets demonstrate that the proposed methods can effectively mitigate RO and enhance adversarial robustness.
Paper Structure (20 sections, 4 equations, 11 figures, 4 tables, 2 algorithms)

This paper contains 20 sections, 4 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: (a) Test robustness of factor ablation AT; (b) Verification experiments with varying budgets of additional adversarial perturbations, (c) Inducing RO with linearly increasing budgets of additional adversarial perturbations, and (d) Adversarial loss and robustness of standard AT.
  • Figure 2: Illustration of the analysis of RO from the feature generalization perspective. In standard AT, a robustness gap exists between the training and test data due to factors like the memorization effect of deep networks and the distribution deviation between the finite training and test data. Subsequently, the robustness gap between training and test data leads to distinct adversarial perturbations for the features in natural data. These distinct adversarial perturbations degrade the feature generalization. The degradation of feature generalization further widens the model’s robustness gap between training and test data, thus forming a vicious cycle.
  • Figure 3: (a) The learning curves of ROFG$_\mathrm{AS}$ with varying attack strengths, and (b) the learning curves of ROFG$_\mathrm{DA}$ with different proportions of small-loss training data.
  • Figure 4: Ablation analysis of ROFG$_\mathrm{AS}$ and ROFG$_\mathrm{DA}$.
  • Figure 5: Experimental results of factor ablation adversarial training (a) on CIFAR10 dataset using Wide ResNet-34-10 with AT, (b) on CIFAR100 dataset using PreAct ResNet-18 with AT, and (c) on CIFAR10 dataset using PreAct ResNet-18 with TRADES.
  • ...and 6 more figures