Generating Less Certain Adversarial Examples Improves Robust Generalization

Minxing Zhang; Michael Backes; Xiao Zhang

Generating Less Certain Adversarial Examples Improves Robust Generalization

Minxing Zhang, Michael Backes, Xiao Zhang

TL;DR

This work addresses robust overfitting in adversarial training by linking model overconfidence on training-time adversarial inputs to degraded robustness. It introduces adversarial certainty (AC), defined as the logit-level variance of predictions on adversarial examples, and proposes Decreasing Adversarial Certainty (DAC) to explicitly minimize AC within a constrained search space while maintaining the model's ability to discriminate adversarial inputs. The authors provide theoretical insights from synthetic data showing that lower AC can lead to improved robust generalization after a gradient update, and develop a practical two-step optimization (DAC) plus a regularized variant (DAC_Reg). Across CIFAR-10/100 and SVHN with multiple architectures and attacks, DAC yields consistent gains in robust testing accuracy and mitigates robust overfitting, with DAC_Reg offering a more efficient alternative. The results suggest that generating less certain adversarial examples during training can meaningfully enhance robust generalization and can complement existing defenses, with open-source implementations available for replication.

Abstract

This paper revisits the robust overfitting phenomenon of adversarial training. Observing that models with better robust generalization performance are less certain in predicting adversarially generated training inputs, we argue that overconfidence in predicting adversarial examples is a potential cause. Therefore, we hypothesize that generating less certain adversarial examples improves robust generalization, and propose a formal definition of adversarial certainty that captures the variance of the model's predicted logits on adversarial examples. Our theoretical analysis of synthetic distributions characterizes the connection between adversarial certainty and robust generalization. Accordingly, built upon the notion of adversarial certainty, we develop a general method to search for models that can generate training-time adversarial inputs with reduced certainty, while maintaining the model's capability in distinguishing adversarial examples. Extensive experiments on image benchmarks demonstrate that our method effectively learns models with consistently improved robustness and mitigates robust overfitting, confirming the importance of generating less certain adversarial examples for robust generalization. Our implementations are available as open-source code at: https://github.com/TrustMLRG/AdvCertainty.

Generating Less Certain Adversarial Examples Improves Robust Generalization

TL;DR

Abstract

Paper Structure (21 sections, 3 theorems, 62 equations, 6 figures, 11 tables)

This paper contains 21 sections, 3 theorems, 62 equations, 6 figures, 11 tables.

Introduction
Related Work
Overconfidence Compromises Robustness
Introducing Adversarial Certainty
Decreasing Adversarial Certainty Helps Robust Generalization
Experiments
Main Results
Effect of Adversarial Certainty on Other Robustness-Enhancing Techniques
Further Discussion on DAC
Improvement on DAC Efficiency
Conclusion and Future Work
Complete Introduction of Preliminaries
More Details of Figures in Sections \ref{['section: adversarial training generates overconfident examples']} and \ref{['section: defining adversarial certainty']}
Proofs of Theoretical Results in Section \ref{['section: defining adversarial certainty']}
Proof of Theorem \ref{['theorem: epsilon and adversarial certainty and robust generalization']}
...and 6 more sections

Key Result

Theorem 1

Consider the aforementioned data distribution $\mu$ and robust classification task. Let $\varepsilon_{te}\in(\eta, 2\eta)$ and $f_w$ be an arbitrary SVM classifier with $w>0$. For any $\varepsilon\in[\eta-\frac{w}{d},\eta]$, $\mathrm{AC}_\varepsilon(f_w; \mu, \mu_{\mathrm{adv}}(\varepsilon))$, the a

Figures (6)

Figure 1: Heatmaps of the label predictions of training- and testing-time generated adversarial examples with respect to models produced from the last and best epochs of adversarial training.
Figure 2: Model confidence in predicting training-time adversarial examples conditioned on the ground-truth class label using different metrics: (a) label-level variance, and (b) adversarial certainty.
Figure 3: Correlation between adversarial certainty and robust generalization under different configurations.
Figure 4: (a) Robust overfitting across different methods, where "-P" and "-W" represent PRN18 and WRN34 respectively. (b) Adversarial certainty gap with respect to AT and AT-DAC conditioned on different ground-truth classes. (c) Training curves of adversarial certainty with respect to different adversarial training algorithms.
Figure 5: Comparison results (%) of different metrics defining adversarial certainty on PRN18 and CIFAR-10 at the last and best epochs.
...and 1 more figures

Theorems & Definitions (8)

Definition 1: Adversarial Certainty
Theorem 1
Definition 2: Adversarial Robustness
proof : Proof of Theorem \ref{['theorem: epsilon and adversarial certainty and robust generalization']}
Lemma 2
proof : Proof of Lemma \ref{['lemma:equivalence']}
Theorem 3
proof : Proof of Theorem \ref{['theorem: infinity perturbation adversarial certainty and robust generalization']}

Generating Less Certain Adversarial Examples Improves Robust Generalization

TL;DR

Abstract

Generating Less Certain Adversarial Examples Improves Robust Generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (8)