Table of Contents
Fetching ...

Improving Adversarial Robust Fairness via Anti-Bias Soft Label Distillation

Shiji Zhao, Ranjie Duan, Xizhe Wang, Xingxing Wei

TL;DR

An in-depth analysis of the potential factors is given and it is argued that the smoothness degree of samples' soft labels for different classes will affect the robust fairness of DNNs from both empirical observation and theoretical analysis.

Abstract

Adversarial Training (AT) has been widely proved to be an effective method to improve the adversarial robustness against adversarial examples for Deep Neural Networks (DNNs). As a variant of AT, Adversarial Robustness Distillation (ARD) has demonstrated its superior performance in improving the robustness of small student models with the guidance of large teacher models. However, both AT and ARD encounter the robust fairness problem: these models exhibit strong robustness when facing part of classes (easy class), but weak robustness when facing others (hard class). In this paper, we give an in-depth analysis of the potential factors and argue that the smoothness degree of samples' soft labels for different classes (i.e., hard class or easy class) will affect the robust fairness of DNNs from both empirical observation and theoretical analysis. Based on the above finding, we propose an Anti-Bias Soft Label Distillation (ABSLD) method to mitigate the adversarial robust fairness problem within the framework of Knowledge Distillation (KD). Specifically, ABSLD adaptively reduces the student's error risk gap between different classes to achieve fairness by adjusting the class-wise smoothness degree of samples' soft labels during the training process, and the smoothness degree of soft labels is controlled by assigning different temperatures in KD to different classes. Extensive experiments demonstrate that ABSLD outperforms state-of-the-art AT, ARD, and robust fairness methods in the comprehensive metric (Normalized Standard Deviation) of robustness and fairness.

Improving Adversarial Robust Fairness via Anti-Bias Soft Label Distillation

TL;DR

An in-depth analysis of the potential factors is given and it is argued that the smoothness degree of samples' soft labels for different classes will affect the robust fairness of DNNs from both empirical observation and theoretical analysis.

Abstract

Adversarial Training (AT) has been widely proved to be an effective method to improve the adversarial robustness against adversarial examples for Deep Neural Networks (DNNs). As a variant of AT, Adversarial Robustness Distillation (ARD) has demonstrated its superior performance in improving the robustness of small student models with the guidance of large teacher models. However, both AT and ARD encounter the robust fairness problem: these models exhibit strong robustness when facing part of classes (easy class), but weak robustness when facing others (hard class). In this paper, we give an in-depth analysis of the potential factors and argue that the smoothness degree of samples' soft labels for different classes (i.e., hard class or easy class) will affect the robust fairness of DNNs from both empirical observation and theoretical analysis. Based on the above finding, we propose an Anti-Bias Soft Label Distillation (ABSLD) method to mitigate the adversarial robust fairness problem within the framework of Knowledge Distillation (KD). Specifically, ABSLD adaptively reduces the student's error risk gap between different classes to achieve fairness by adjusting the class-wise smoothness degree of samples' soft labels during the training process, and the smoothness degree of soft labels is controlled by assigning different temperatures in KD to different classes. Extensive experiments demonstrate that ABSLD outperforms state-of-the-art AT, ARD, and robust fairness methods in the comprehensive metric (Normalized Standard Deviation) of robustness and fairness.
Paper Structure (25 sections, 3 theorems, 45 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 25 sections, 3 theorems, 45 equations, 6 figures, 7 tables, 1 algorithm.

Key Result

Corollary 1

A dataset $(x,y)\sim\mathcal{D}$ contains $2$ classes (hard class $c_+$ and easy class $c_-$). Based on the label distribution $y$, the soft label distribution with same smoothness degree $P_{\lambda1}=\{p_{c_+}^{\lambda1},p_{c_-}^{\lambda1}\}$ can be generated and satisfies: If a DNN model $f$ is optimized by minimizing the average optimization error risk in $\mathcal{D}$ with the guidance of th

Figures (6)

  • Figure 1: The comparison between the sample-based fair adversarial training and our label-based fair adversarial training. For the former ideology in (a), the trained model's bias is avoided by re-weighting the sample's importance according to the different contribution to fairness. For the latter ideology in (b), the trained model's bias is avoided by re-temperating the smoothness degree of soft labels for different classes.
  • Figure 2: The class-wise and average robustness of DNNs guided by soft labels with the same smoothness degree (SSD) and different smoothness degree (DSD) for different classes, respectively. For the soft labels with different smoothness degrees, we use sharper soft labels for hard classes and use smoother soft labels for easy classes. We select two DNNs (ResNet-18 and MobileNet-v2) trained by SAT madry2017towards on CIFAR-10. The robust accuracy is evaluated based on PGD. The checkpoint is selected based on the best checkpoint of the highest mean value of all-class average robustness and the worst class robustness following wei2023cfa. We see that blue lines and red lines have similar average robustness, but the worst robustness of blue lines are remarkably improved compared with red lines.
  • Figure 3: The class-wise robustness (PGD) of models guided by RSLAD and ABSLD on CIFAR-10. We can see that the harder classes' robustness (class 3, 4, 5, 6) of ABSLD (blue lines) have different levels of improvement compared with RSLAD (red lines).
  • Figure 4: Ablation study for Baseline, Baseline+ABSLD$_{adv}$, and ABSLD.
  • Figure 5: Standard deviation of class-wise clean optimization error risk.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Corollary 1
  • Theorem 1
  • Theorem 2