Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

Xiangyu Yin; Wenjie Ruan

Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

Xiangyu Yin, Wenjie Ruan

TL;DR

This work addresses the persistent degradation of standard generalization in adversarial training by reframing model complexity through the geometric Fisher-Rao norm and tying it to the Cross-Entropy loss–based Rademacher complexity for ReLU networks. It identifies a logit-centered complexity variable $\Gamma_{ce}$ that captures how width and training objectives affect the generalization gap between adversarially trained and standard models, with epoch-dependent behavior. Building on these insights, the authors propose LOAT, a lightweight, epoch-aware regularization framework that combines standard logit-oriented penalties with adaptive adversarial logit pairing to reduce the CE generalization gap while preserving robustness. Extensive experiments across PGD-AT, TRADES, MART, and DM-AT on CIFAR-10 (and augmented datasets) show LOAT consistently improves clean accuracy and adversarial robustness with minimal overhead, demonstrating practical impact for improving robustness-generalization trade-offs in modern adversarial training pipelines.

Abstract

Adversarial training is extensively utilized to improve the adversarial robustness of deep neural networks. Yet, mitigating the degradation of standard generalization performance in adversarial-trained models remains an open problem. This paper attempts to resolve this issue through the lens of model complexity. First, We leverage the Fisher-Rao norm, a geometrically invariant metric for model complexity, to establish the non-trivial bounds of the Cross-Entropy Loss-based Rademacher complexity for a ReLU-activated Multi-Layer Perceptron. Then we generalize a complexity-related variable, which is sensitive to the changes in model width and the trade-off factors in adversarial training. Moreover, intensive empirical evidence validates that this variable highly correlates with the generalization gap of Cross-Entropy loss between adversarial-trained and standard-trained models, especially during the initial and final phases of the training process. Building upon this observation, we propose a novel regularization framework, called Logit-Oriented Adversarial Training (LOAT), which can mitigate the trade-off between robustness and accuracy while imposing only a negligible increase in computational overhead. Our extensive experiments demonstrate that the proposed regularization strategy can boost the performance of the prevalent adversarial training algorithms, including PGD-AT, TRADES, TRADES (LSE), MART, and DM-AT, across various network architectures. Our code will be available at https://github.com/TrustAI/LOAT.

Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

TL;DR

that captures how width and training objectives affect the generalization gap between adversarially trained and standard models, with epoch-dependent behavior. Building on these insights, the authors propose LOAT, a lightweight, epoch-aware regularization framework that combines standard logit-oriented penalties with adaptive adversarial logit pairing to reduce the CE generalization gap while preserving robustness. Extensive experiments across PGD-AT, TRADES, MART, and DM-AT on CIFAR-10 (and augmented datasets) show LOAT consistently improves clean accuracy and adversarial robustness with minimal overhead, demonstrating practical impact for improving robustness-generalization trade-offs in modern adversarial training pipelines.

Abstract

Paper Structure (21 sections, 3 theorems, 22 equations, 4 figures, 11 tables, 1 algorithm)

This paper contains 21 sections, 3 theorems, 22 equations, 4 figures, 11 tables, 1 algorithm.

Introduction
Related works
Trade-off Between Robustness and Accuracy
Fisher-Rao Norm
Preliminaries
Basic Notions
Generalization Gap between Algorithms
Proposed Methods
Rademacher Complexity via CE Loss
Bounds of complexity via Fisher-Rao Norm
Sensitivity to Complexity-Related Factors
Influence on Generalization Gap of CE Loss
Logit-Oriented Adversarial Training
Standard Logit-Oriented Regularization
Adaptive Adversarial Logit Pairing
...and 6 more sections

Key Result

Lemma 1

Given an $L$-layer MLP-approximated hypothesis $f_{\mathcal{W}}^{L}$ as defined in Def. definition_for_nn, if $\mathcal{L}$ is smooth with respect to $f_{\mathcal{W}}^{L}$, the following identity holds:

Figures (4)

Figure 1: Standard generalization performance on CIFAR10.
Figure 2: The correlation between $\hat{\gamma}_{ce}^{N_{tr}^{C}}$ on the x-axis and $\hat{\gamma}_{ce}^{N_{tr}^{M}}$ on the y-axis within 1-layer MLPs. Each data point corresponds to different epochs during the training process.
Figure 3: Depicting $\Gamma_{ce}$ and $G_{\mathcal{L}_{ce}}^{\left<\mathcal{F}_{at},\mathcal{F}_{std}\right>}$ in 1-layer MLPs with respect to various trade-off factors $\lambda$ ranging from 0.1 to 1.0 in $\mathcal{F}_{at}$. The x-axis represents the number of hidden units from 50 to 5000.
Figure 4: Assessment of $G_{\mathcal{L}_{ce}}^{\left<\mathcal{F}_{at}, \mathcal{F}_{std}\right>}$ over three 3 distinct architecture-dataset combinations. Diverse symbols such as $\bm{\times}$, $\blacktriangle$ and $\CIRCLE$ represent different numbers of hidden units. Additionally, varying shades indicate a range of trade-off factors $\lambda$ within $\mathcal{F}_{at}$, specifically from 0.1 to 1.0.

Theorems & Definitions (8)

Definition 1: $L$-layer MLP
Definition 2: Standard Risk
Definition 3
Lemma 1: Fisher-Rao Norm liang2019fisher
Lemma 2: $\mathcal{L}_{ce}$-based Fisher-Rao Norm Ball
Definition 4
Theorem 1
Remark 1

Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

TL;DR

Abstract

Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (8)