Towards Fairness-Aware Adversarial Learning

Yanghao Zhang; Tianle Zhang; Ronghui Mu; Xiaowei Huang; Wenjie Ruan

Towards Fairness-Aware Adversarial Learning

Yanghao Zhang, Tianle Zhang, Ronghui Mu, Xiaowei Huang, Wenjie Ruan

TL;DR

FAAL tackles the robust fairness gap in adversarial training by formulating a min-max-max objective that optimizes the worst-class distribution through Distributional Robust Optimization. It introduces Class-wise Distributionally Adversarial Weight (CDAW) and solves a constrained KL-DRO subproblem per batch to bias learning toward the most vulnerable classes. The method integrates with existing AT approaches and can dramatically reduce the number of epochs needed to achieve fairness (2-epoch fine-tuning) while preserving clean and robust accuracy, outperforming FRL, CFA, and WAT on CIFAR-10/100. The results demonstrate practical improvements in worst-class robustness and efficiency, highlighting a viable path toward fairness-aware robustness in vision models.

Abstract

Although adversarial training (AT) has proven effective in enhancing the model's robustness, the recently revealed issue of fairness in robustness has not been well addressed, i.e. the robust accuracy varies significantly among different categories. In this paper, instead of uniformly evaluating the model's average class performance, we delve into the issue of robust fairness, by considering the worst-case distribution across various classes. We propose a novel learning paradigm, named Fairness-Aware Adversarial Learning (FAAL). As a generalization of conventional AT, we re-define the problem of adversarial training as a min-max-max framework, to ensure both robustness and fairness of the trained model. Specifically, by taking advantage of distributional robust optimization, our method aims to find the worst distribution among different categories, and the solution is guaranteed to obtain the upper bound performance with high probability. In particular, FAAL can fine-tune an unfair robust model to be fair within only two epochs, without compromising the overall clean and robust accuracies. Extensive experiments on various image datasets validate the superior performance and efficiency of the proposed FAAL compared to other state-of-the-art methods.

Towards Fairness-Aware Adversarial Learning

TL;DR

Abstract

Paper Structure (14 sections, 1 theorem, 10 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 14 sections, 1 theorem, 10 equations, 3 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Robust Fairness
Distributional Robust Optimization
Methodology
Preliminaries
Problem Definition
Fairness-Aware Adversarial Learning
Experimental Results
Fine-tuning for Enhancing Robust Fairness
Training from Scratch for Enhancing Fairness
Additional Experiments on CIFAR-100 dataset
Essential Differences to SOTAs
Conclusion

Key Result

Theorem 1

Given the loss $\mathcal{L}_{\rm FAAL}$ in faal_eq1 on the observed distribution, and suppose the regular loss $\mathcal{L}=:\frac{1}{C} \sum_{c=1}^{C}\ell'_c$ on the test distribution with unknown group distribution shift, then the following holds for all $\bm{w^{cda}} \in \Delta_C$: Where ${\rm KL}(\mathcal{U},\bm{w^{cda}})\leq \tau$, $\mathcal{U}$ is the uniform distribution.

Figures (3)

Figure 1: Class-wise accuracy of the Wide-ResNet34-10 model on CIFAR-10 dataset, where AA accuracy represents the robust accuracy against AutoAttack.
Figure 2: Class-wise robust accuracy against AutoAttack after fine-tuning the PGD adversarially trained WRN model
Figure 3: Class-wise robust accuracy against AutoAttack after adversarially trained PRN-18 model from scratch

Theorems & Definitions (2)

Definition 1: CDAW: Class-wise Distributionally Adversarial Weight
Theorem 1

Towards Fairness-Aware Adversarial Learning

TL;DR

Abstract

Towards Fairness-Aware Adversarial Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)