Towards Fair Class-wise Robustness: Class Optimal Distribution Adversarial Training

Hongxin Zhi; Hongtao Yu; Shaome Li; Xiuming Zhao; Yiteng Wu

Towards Fair Class-wise Robustness: Class Optimal Distribution Adversarial Training

Hongxin Zhi, Hongtao Yu, Shaome Li, Xiuming Zhao, Yiteng Wu

TL;DR

CODAT addresses robust fairness gaps in adversarial training by formulating a distributionally robust, class-aware min–max problem. It derives a closed-form solution for the inner maximization under a $\chi^2$-divergence ambiguity set, producing a deterministic equivalent objective that enables joint optimization of class weights and model parameters. A novel Fairness Elasticity Coefficient (FEC) is proposed to quantify the trade-off between worst-class robustness and average robustness. Empirical results across CIFAR variants, SVHN, and STL-10 show that CODAT improves worst-class robustness and fairness with competitive average performance, outperforming several state-of-the-art baselines and demonstrating scalability to larger models.

Abstract

Adversarial training has proven to be a highly effective method for improving the robustness of deep neural networks against adversarial attacks. Nonetheless, it has been observed to exhibit a limitation in terms of robust fairness, characterized by a significant disparity in robustness across different classes. Recent efforts to mitigate this problem have turned to class-wise reweighted methods. However, these methods suffer from a lack of rigorous theoretical analysis and are limited in their exploration of the weight space, as they mainly rely on existing heuristic algorithms or intuition to compute weights. In addition, these methods fail to guarantee the consistency of the optimization direction due to the decoupled optimization of weights and the model parameters. They potentially lead to suboptimal weight assignments and consequently, a suboptimal model. To address these problems, this paper proposes a novel min-max training framework, Class Optimal Distribution Adversarial Training (CODAT), which employs distributionally robust optimization to fully explore the class-wise weight space, thus enabling the identification of the optimal weight with theoretical guarantees. Furthermore, we derive a closed-form optimal solution to the internal maximization and then get a deterministic equivalent objective function, which provides a theoretical basis for the joint optimization of weights and model parameters. Meanwhile, we propose a fairness elasticity coefficient for the evaluation of the algorithm with regard to both robustness and robust fairness. Experimental results on various datasets show that the proposed method can effectively improve the robust fairness of the model and outperform the state-of-the-art approaches.

Towards Fair Class-wise Robustness: Class Optimal Distribution Adversarial Training

TL;DR

-divergence ambiguity set, producing a deterministic equivalent objective that enables joint optimization of class weights and model parameters. A novel Fairness Elasticity Coefficient (FEC) is proposed to quantify the trade-off between worst-class robustness and average robustness. Empirical results across CIFAR variants, SVHN, and STL-10 show that CODAT improves worst-class robustness and fairness with competitive average performance, outperforming several state-of-the-art baselines and demonstrating scalability to larger models.

Abstract

Paper Structure (27 sections, 2 theorems, 41 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 27 sections, 2 theorems, 41 equations, 8 figures, 5 tables, 1 algorithm.

Introduction
Related work
Adversarial training
Robust fairness
Distributionally robust optimization
Preliminary and Problem analysis
Notation
Standard adversarial training
The potential reason for robust fairness problem in standard AT
Class-wise weighted adversarial training
Worst-class adversarial training
Method
Class optimal distribution adversarial training
A closed-form solution for the inner maximization problem
The unnecessity of regularizers for degradation prevention in CODAT
...and 12 more sections

Key Result

Theorem 1

If assumption:1 holds true, the closed-form optimal solution to the inner maximization problem within eq:5 is

Figures (8)

Figure 1: Diagram of robust fairness problem in an adversarially trained model on CIFAR-10 using ResNet-18 under $\ell_\infty$ threat model (with the 10-step $\ell_\infty$ PGD attack). The model exhibits inconsistency in terms of accuracy on both adversarial examples (generated by the 20-step $\ell_\infty$ PGD attack) and natural input.
Figure 2: A simple comparison of previous studies and our method. Previous studies, which rely on heuristic algorithms for weight computation, are constrained by two limitations: an inability to fully explore the weight space and a lack of capability to jointly optimize weights and model parameters. Our method, informed by Distributionally Robust Optimization principles and equipped with a closed-form optimal solution, effectively surmounts these limitations.
Figure 3: Non-uniform semantic distances among classes in the CIFAR-10 test set. (a) UMAP visualization of the distribution of natural examples with a naturally trained ResNet-18. (b) The confusion matrix of robustness under PGD-20 attack with a robust ResNet-18. The red-highlighted areas within the matrix indicate classes that are more frequently misclassified by the model. It can be observed that classes that are more semantically closer tend to exhibit higher misclassification rates.
Figure 4: Comparative analysis of class-wise robust accuracy for our method and baselines on CIFAR-10 using ResNet-18 under PGD-100 attack.
Figure 5: The variance of class-wise robust accuracy for all methods on CIFAR-10 using ResNet-18 under PGD-100, CW-30, and AA attacks, respectively. A lower variance indicates better robust fairness.
...and 3 more figures

Theorems & Definitions (5)

Remark 1
Theorem 1
Definition 1
Theorem 1
proof

Towards Fair Class-wise Robustness: Class Optimal Distribution Adversarial Training

TL;DR

Abstract

Towards Fair Class-wise Robustness: Class Optimal Distribution Adversarial Training

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)