Table of Contents
Fetching ...

DAFA: Distance-Aware Fair Adversarial Training

Hyungyu Lee, Saehyung Lee, Hyemi Jang, Junsung Park, Ho Bae, Sungroh Yoon

TL;DR

The Distance-Aware Fair Adversarial training (DAFA) methodology is introduced, which addresses robust fairness by taking into account the similarities between classes and assigns distinct loss weights and adversarial margins to each class and adjusts them to encourage a trade-off in robustness among similar classes.

Abstract

The disparity in accuracy between classes in standard training is amplified during adversarial training, a phenomenon termed the robust fairness problem. Existing methodologies aimed to enhance robust fairness by sacrificing the model's performance on easier classes in order to improve its performance on harder ones. However, we observe that under adversarial attacks, the majority of the model's predictions for samples from the worst class are biased towards classes similar to the worst class, rather than towards the easy classes. Through theoretical and empirical analysis, we demonstrate that robust fairness deteriorates as the distance between classes decreases. Motivated by these insights, we introduce the Distance-Aware Fair Adversarial training (DAFA) methodology, which addresses robust fairness by taking into account the similarities between classes. Specifically, our method assigns distinct loss weights and adversarial margins to each class and adjusts them to encourage a trade-off in robustness among similar classes. Experimental results across various datasets demonstrate that our method not only maintains average robust accuracy but also significantly improves the worst robust accuracy, indicating a marked improvement in robust fairness compared to existing methods.

DAFA: Distance-Aware Fair Adversarial Training

TL;DR

The Distance-Aware Fair Adversarial training (DAFA) methodology is introduced, which addresses robust fairness by taking into account the similarities between classes and assigns distinct loss weights and adversarial margins to each class and adjusts them to encourage a trade-off in robustness among similar classes.

Abstract

The disparity in accuracy between classes in standard training is amplified during adversarial training, a phenomenon termed the robust fairness problem. Existing methodologies aimed to enhance robust fairness by sacrificing the model's performance on easier classes in order to improve its performance on harder ones. However, we observe that under adversarial attacks, the majority of the model's predictions for samples from the worst class are biased towards classes similar to the worst class, rather than towards the easy classes. Through theoretical and empirical analysis, we demonstrate that robust fairness deteriorates as the distance between classes decreases. Motivated by these insights, we introduce the Distance-Aware Fair Adversarial training (DAFA) methodology, which addresses robust fairness by taking into account the similarities between classes. Specifically, our method assigns distinct loss weights and adversarial margins to each class and adjusts them to encourage a trade-off in robustness among similar classes. Experimental results across various datasets demonstrate that our method not only maintains average robust accuracy but also significantly improves the worst robust accuracy, indicating a marked improvement in robust fairness compared to existing methods.
Paper Structure (48 sections, 9 theorems, 38 equations, 8 figures, 14 tables, 2 algorithms)

This paper contains 48 sections, 9 theorems, 38 equations, 8 figures, 14 tables, 2 algorithms.

Key Result

Theorem 1

Let $0<\epsilon<\eta$ and $A \equiv\frac{2\sqrt{d}\sigma}{\sigma^2-1} > 0$. Given a data distribution $\mathcal{D}$ as described in equation eq:method_gaussian_distribution, $f_{nat}$ and $f_{rob}$ exhibit the following standard and robust errors, respectively: Consequently, both $\mathcal{R}_{nat}(f_{nat}|+1)$ and $\mathcal{R}_{rob}(f_{rob}|+1)$ decrease monotonically with respect to $\alpha$.

Figures (8)

  • Figure 1: The figure illustrates the performance improvement for the worst class (cat) when training the robust classifier defense_trades on the CIFAR-10 dataset dataset_cifar, by varying the training intensity for different class sets. While the previous approach fairness_frl limits the training of each class proportional to class-wise accuracy, our method considers inter-class similarity. Our approach intensifies the training constraints on animal classes, which are neighbors to the cat class, compared to previous methods, while relaxing constraints on non-animal classes.
  • Figure 2: (a) and (b) depict the results from two distinct 5-class classification tasks. (a) presents the robust accuracy of the robust classifier trained on the cat class alongside other animal classes from CIFAR-10, while (b) illustrates the robust accuracy of the robust classifier trained on the cat class in conjunction with non-animal classes in CIFAR-10.
  • Figure 3: (a) displays the class distances between classes in CIFAR-10 as represented in the TRADES model. (b) and (c) depict the results from binary classification tasks on CIFAR-10's bird class paired with three distinct classes: deer, horse, and truck. Specifically, (b) presents the performance for the bird class, while (c) displays the accuracy disparity for each respective classifier.
  • Figure 4: (a) illustrates the test robust accuracy and the average distance from each class to the other classes in the TRADES models for CIFAR-10. (b) displays the test robust accuracy and variance for each class in the TRADES models applied to CIFAR-10 (See Appendix \ref{['appendix:experimental_details']} for further details).
  • Figure 5: Results of binary classification tasks between bird and cat.
  • ...and 3 more figures

Theorems & Definitions (15)

  • Theorem 1
  • Theorem 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • ...and 5 more