Table of Contents
Fetching ...

Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement

Daiwei Yu, Zhuorong Li, Lina Wei, Canghong Jin, Yun Zhang, Sixian Chan

TL;DR

This paper identifies a connection between robust overfitting and the excessive memorization of noisy labels in AT from a view of gradient norm and proposes a label refinement approach for AT, which first self-refines a more accurate and informative label distribution from over-confident hard labels, and calibrates the training by dynamically incorporating knowledge from self-distilled models into the current model and thus requiring no external teachers.

Abstract

Adversarial training (AT) is currently one of the most effective ways to obtain the robustness of deep neural networks against adversarial attacks. However, most AT methods suffer from robust overfitting, i.e., a significant generalization gap in adversarial robustness between the training and testing curves. In this paper, we first identify a connection between robust overfitting and the excessive memorization of noisy labels in AT from a view of gradient norm. As such label noise is mainly caused by a distribution mismatch and improper label assignments, we are motivated to propose a label refinement approach for AT. Specifically, our Self-Guided Label Refinement first self-refines a more accurate and informative label distribution from over-confident hard labels, and then it calibrates the training by dynamically incorporating knowledge from self-distilled models into the current model and thus requiring no external teachers. Empirical results demonstrate that our method can simultaneously boost the standard accuracy and robust performance across multiple benchmark datasets, attack types, and architectures. In addition, we also provide a set of analyses from the perspectives of information theory to dive into our method and suggest the importance of soft labels for robust generalization.

Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement

TL;DR

This paper identifies a connection between robust overfitting and the excessive memorization of noisy labels in AT from a view of gradient norm and proposes a label refinement approach for AT, which first self-refines a more accurate and informative label distribution from over-confident hard labels, and calibrates the training by dynamically incorporating knowledge from self-distilled models into the current model and thus requiring no external teachers.

Abstract

Adversarial training (AT) is currently one of the most effective ways to obtain the robustness of deep neural networks against adversarial attacks. However, most AT methods suffer from robust overfitting, i.e., a significant generalization gap in adversarial robustness between the training and testing curves. In this paper, we first identify a connection between robust overfitting and the excessive memorization of noisy labels in AT from a view of gradient norm. As such label noise is mainly caused by a distribution mismatch and improper label assignments, we are motivated to propose a label refinement approach for AT. Specifically, our Self-Guided Label Refinement first self-refines a more accurate and informative label distribution from over-confident hard labels, and then it calibrates the training by dynamically incorporating knowledge from self-distilled models into the current model and thus requiring no external teachers. Empirical results demonstrate that our method can simultaneously boost the standard accuracy and robust performance across multiple benchmark datasets, attack types, and architectures. In addition, we also provide a set of analyses from the perspectives of information theory to dive into our method and suggest the importance of soft labels for robust generalization.
Paper Structure (26 sections, 4 theorems, 33 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 26 sections, 4 theorems, 33 equations, 12 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Let $u$ be the uniform random variable with p.d.f $p(u)$. By using the composition in eq:de_xent, there exists an interpolation ration $\lambda$ between the clean label distribution and uniform distribution, such that where $p(y^\ast \vert x^\prime , w) = \lambda \cdot p(y \vert x^\prime, w) + (1-\lambda) \cdot p(u)$ and the symbol $\lesssim$ means that the corresponding inequality up to an $c$-i

Figures (12)

  • Figure 1: In figure (a), we calculate the gradient norm of vanilla adversarially trained PreAct-ResNet 18 on CIFAR-10 for robustness against $\ell_\infty$ perturbations of radius $8/255$. In figure (b), we show the robust accuracy under PGD-20 attack under the same settings with figure (a). The gradient norm keeps non-monotonically ramping up when robust overfitting happens.
  • Figure 2: Robust accuracy of models employing different label assignment methods in adversarial training.
  • Figure 3: The mean confidence of model in the correct and incorrect predictions over clean and adversarial test sets.
  • Figure 4: Test accuracy (%) on CIFAR-10 dataset (with 40% label noise). We split the training set into 1) untouched portion, where the labels of elements are left untouched; 2) corrupted portion, where the labels of elements are assigned uniformly at random.
  • Figure 5: Visualization of representations learned by standard training with hard/soft labels and the proposed SGLR with self-guided distribution on CIFAR-10 dataset under various levels of symmetric noisy labels ($\eta \in [ 0.0, 0.2, 0.4, 0.6 ]$).
  • ...and 7 more figures

Theorems & Definitions (4)

  • Theorem 1: Soft label could reduce the IIW
  • Theorem 2
  • Theorem 1: Soft label could reduce the IIW
  • Theorem 2