Annealing Self-Distillation Rectification Improves Adversarial Training

Yu-Yu Wu; Hung-Jui Wang; Shang-Tse Chen

Annealing Self-Distillation Rectification Improves Adversarial Training

Yu-Yu Wu, Hung-Jui Wang, Shang-Tse Chen

TL;DR

Robust overfitting in adversarial training arises from a mismatch between hard one-hot targets and distribution shifts under attack. The authors analyze robust models’ output properties and introduce Annealing Self-Distillation Rectification (ADR), which uses a mean-teacher EMA with cosine-annealed temperature to produce soft, noise-aware targets that reflect inter-class relations during training. ADR replaces hard labels in AT objectives and is plug-and-play with methods like TRADES, WA, and AWP, all without requiring pre-trained teachers. Across CIFAR-10/100 and TinyImageNet-200, ADR consistently improves robustness, reduces overfitting gaps, and even enhances standard accuracy in many settings, while also yielding flatter loss landscapes; these results establish ADR as a practical, data-driven approach to strengthening adversarial defenses.

Abstract

In standard adversarial training, models are optimized to fit one-hot labels within allowable adversarial perturbation budgets. However, the ignorance of underlying distribution shifts brought by perturbations causes the problem of robust overfitting. To address this issue and enhance adversarial robustness, we analyze the characteristics of robust models and identify that robust models tend to produce smoother and well-calibrated outputs. Based on the observation, we propose a simple yet effective method, Annealing Self-Distillation Rectification (ADR), which generates soft labels as a better guidance mechanism that accurately reflects the distribution shift under attack during adversarial training. By utilizing ADR, we can obtain rectified distributions that significantly improve model robustness without the need for pre-trained models or extensive extra computation. Moreover, our method facilitates seamless plug-and-play integration with other adversarial training techniques by replacing the hard labels in their objectives. We demonstrate the efficacy of ADR through extensive experiments and strong performances across datasets.

Annealing Self-Distillation Rectification Improves Adversarial Training

TL;DR

Abstract

Paper Structure (36 sections, 5 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 36 sections, 5 equations, 10 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Rectify labels in AT.
Preliminaries
Adversarial training (AT)
Distributional difference in the outputs of robust and non-robust model
Robust model generates a random output on OOD data
Robust models are uncertain on incorrectly classified examples
Output distribution of models on clean or adversarial examples are consistent
Methodology
Motivation: Rectify labels in a noise-aware manner
Annealing Self-Distillation Rectification
Experiments
Training and evaluation setup
Superior performance across robustified methods and datasets
...and 21 more sections

Figures (10)

Figure 1: Output distribution on OOD data. Both models are trained on CIFAR-10 and tested on CIFAR-100.
Figure 2: (a) and (b) are entropy distributions on the correctly classified and misclassified examples on the standard and robust model respectively. (c) and (d) are entropy distributions on the clean and adversarial examples on the standard and robust model respectively. (e) shows histograms of JS divergence for output distribution shift under the PGD-10 attack.
Figure 3: Overview of ADR.
Figure 4: Model weight loss landscape comparison for AT and ADR.
Figure 5: Effectiveness of different temperature $\tau$ and label interpolation factor $\lambda$ of ADR.
...and 5 more figures

Annealing Self-Distillation Rectification Improves Adversarial Training

TL;DR

Abstract

Annealing Self-Distillation Rectification Improves Adversarial Training

Authors

TL;DR

Abstract

Table of Contents

Figures (10)