Towards Fairness-Aware Adversarial Learning
Yanghao Zhang, Tianle Zhang, Ronghui Mu, Xiaowei Huang, Wenjie Ruan
TL;DR
FAAL tackles the robust fairness gap in adversarial training by formulating a min-max-max objective that optimizes the worst-class distribution through Distributional Robust Optimization. It introduces Class-wise Distributionally Adversarial Weight (CDAW) and solves a constrained KL-DRO subproblem per batch to bias learning toward the most vulnerable classes. The method integrates with existing AT approaches and can dramatically reduce the number of epochs needed to achieve fairness (2-epoch fine-tuning) while preserving clean and robust accuracy, outperforming FRL, CFA, and WAT on CIFAR-10/100. The results demonstrate practical improvements in worst-class robustness and efficiency, highlighting a viable path toward fairness-aware robustness in vision models.
Abstract
Although adversarial training (AT) has proven effective in enhancing the model's robustness, the recently revealed issue of fairness in robustness has not been well addressed, i.e. the robust accuracy varies significantly among different categories. In this paper, instead of uniformly evaluating the model's average class performance, we delve into the issue of robust fairness, by considering the worst-case distribution across various classes. We propose a novel learning paradigm, named Fairness-Aware Adversarial Learning (FAAL). As a generalization of conventional AT, we re-define the problem of adversarial training as a min-max-max framework, to ensure both robustness and fairness of the trained model. Specifically, by taking advantage of distributional robust optimization, our method aims to find the worst distribution among different categories, and the solution is guaranteed to obtain the upper bound performance with high probability. In particular, FAAL can fine-tune an unfair robust model to be fair within only two epochs, without compromising the overall clean and robust accuracies. Extensive experiments on various image datasets validate the superior performance and efficiency of the proposed FAAL compared to other state-of-the-art methods.
