Table of Contents
Fetching ...

Towards Fairness-Aware Adversarial Learning

Yanghao Zhang, Tianle Zhang, Ronghui Mu, Xiaowei Huang, Wenjie Ruan

TL;DR

FAAL tackles the robust fairness gap in adversarial training by formulating a min-max-max objective that optimizes the worst-class distribution through Distributional Robust Optimization. It introduces Class-wise Distributionally Adversarial Weight (CDAW) and solves a constrained KL-DRO subproblem per batch to bias learning toward the most vulnerable classes. The method integrates with existing AT approaches and can dramatically reduce the number of epochs needed to achieve fairness (2-epoch fine-tuning) while preserving clean and robust accuracy, outperforming FRL, CFA, and WAT on CIFAR-10/100. The results demonstrate practical improvements in worst-class robustness and efficiency, highlighting a viable path toward fairness-aware robustness in vision models.

Abstract

Although adversarial training (AT) has proven effective in enhancing the model's robustness, the recently revealed issue of fairness in robustness has not been well addressed, i.e. the robust accuracy varies significantly among different categories. In this paper, instead of uniformly evaluating the model's average class performance, we delve into the issue of robust fairness, by considering the worst-case distribution across various classes. We propose a novel learning paradigm, named Fairness-Aware Adversarial Learning (FAAL). As a generalization of conventional AT, we re-define the problem of adversarial training as a min-max-max framework, to ensure both robustness and fairness of the trained model. Specifically, by taking advantage of distributional robust optimization, our method aims to find the worst distribution among different categories, and the solution is guaranteed to obtain the upper bound performance with high probability. In particular, FAAL can fine-tune an unfair robust model to be fair within only two epochs, without compromising the overall clean and robust accuracies. Extensive experiments on various image datasets validate the superior performance and efficiency of the proposed FAAL compared to other state-of-the-art methods.

Towards Fairness-Aware Adversarial Learning

TL;DR

FAAL tackles the robust fairness gap in adversarial training by formulating a min-max-max objective that optimizes the worst-class distribution through Distributional Robust Optimization. It introduces Class-wise Distributionally Adversarial Weight (CDAW) and solves a constrained KL-DRO subproblem per batch to bias learning toward the most vulnerable classes. The method integrates with existing AT approaches and can dramatically reduce the number of epochs needed to achieve fairness (2-epoch fine-tuning) while preserving clean and robust accuracy, outperforming FRL, CFA, and WAT on CIFAR-10/100. The results demonstrate practical improvements in worst-class robustness and efficiency, highlighting a viable path toward fairness-aware robustness in vision models.

Abstract

Although adversarial training (AT) has proven effective in enhancing the model's robustness, the recently revealed issue of fairness in robustness has not been well addressed, i.e. the robust accuracy varies significantly among different categories. In this paper, instead of uniformly evaluating the model's average class performance, we delve into the issue of robust fairness, by considering the worst-case distribution across various classes. We propose a novel learning paradigm, named Fairness-Aware Adversarial Learning (FAAL). As a generalization of conventional AT, we re-define the problem of adversarial training as a min-max-max framework, to ensure both robustness and fairness of the trained model. Specifically, by taking advantage of distributional robust optimization, our method aims to find the worst distribution among different categories, and the solution is guaranteed to obtain the upper bound performance with high probability. In particular, FAAL can fine-tune an unfair robust model to be fair within only two epochs, without compromising the overall clean and robust accuracies. Extensive experiments on various image datasets validate the superior performance and efficiency of the proposed FAAL compared to other state-of-the-art methods.
Paper Structure (14 sections, 1 theorem, 10 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 14 sections, 1 theorem, 10 equations, 3 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Given the loss $\mathcal{L}_{\rm FAAL}$ in faal_eq1 on the observed distribution, and suppose the regular loss $\mathcal{L}=:\frac{1}{C} \sum_{c=1}^{C}\ell'_c$ on the test distribution with unknown group distribution shift, then the following holds for all $\bm{w^{cda}} \in \Delta_C$: Where ${\rm KL}(\mathcal{U},\bm{w^{cda}})\leq \tau$, $\mathcal{U}$ is the uniform distribution.

Figures (3)

  • Figure 1: Class-wise accuracy of the Wide-ResNet34-10 model on CIFAR-10 dataset, where AA accuracy represents the robust accuracy against AutoAttack.
  • Figure 2: Class-wise robust accuracy against AutoAttack after fine-tuning the PGD adversarially trained WRN model
  • Figure 3: Class-wise robust accuracy against AutoAttack after adversarially trained PRN-18 model from scratch

Theorems & Definitions (2)

  • Definition 1: CDAW: Class-wise Distributionally Adversarial Weight
  • Theorem 1