Boosting Fair Classifier Generalization through Adaptive Priority Reweighing

Zhihao Hu; Yiran Xu; Mengnan Du; Jindong Gu; Xinmei Tian; Fengxiang He

Boosting Fair Classifier Generalization through Adaptive Priority Reweighing

Zhihao Hu, Yiran Xu, Mengnan Du, Jindong Gu, Xinmei Tian, Fengxiang He

TL;DR

This work tackles the problem of fairness generalization under distribution shifts by introducing Adaptive Priority Reweighing (APW), a margin-aware, subgroup-aware reweighting mechanism. APW is framed as a bilevel optimization that prioritizes samples near the decision boundary within each subgroup, with updating rules for subgroup weights $W_{y,a}^{(t)}$ and sample weights $w_i^{(t)}$ based on distances to the decision boundary $d$ and margins $\phi_i^{(t)}$. The authors provide a Rademacher-complexity-based generalization bound and demonstrate APW's effectiveness across tabular, vision, and language benchmarks, showing improved fairness metrics such as $\Delta_{\mathrm{DP}}$, $\Delta_{\mathrm{EO}}$, and $\Delta_{\mathrm{EOP}}$ with modest accuracy trade-offs, and applicability to fine-tuning pretrained unfair models. The approach advances practical fairness by reducing the gap between training and test-time fairness, offering a principled, generalizable method for robust deployment in real-world decision systems.

Abstract

With the increasing penetration of machine learning applications in critical decision-making areas, calls for algorithmic fairness are more prominent. Although there have been various modalities to improve algorithmic fairness through learning with fairness constraints, their performance does not generalize well in the test set. A performance-promising fair algorithm with better generalizability is needed. This paper proposes a novel adaptive reweighing method to eliminate the impact of the distribution shifts between training and test data on model generalizability. Most previous reweighing methods propose to assign a unified weight for each (sub)group. Rather, our method granularly models the distance from the sample predictions to the decision boundary. Our adaptive reweighing method prioritizes samples closer to the decision boundary and assigns a higher weight to improve the generalizability of fair classifiers. Extensive experiments are performed to validate the generalizability of our adaptive priority reweighing method for accuracy and fairness measures (i.e., equal opportunity, equalized odds, and demographic parity) in tabular benchmarks. We also highlight the performance of our method in improving the fairness of language and vision models. The code is available at https://github.com/che2198/APW.

Boosting Fair Classifier Generalization through Adaptive Priority Reweighing

TL;DR

and sample weights

based on distances to the decision boundary

and margins

. The authors provide a Rademacher-complexity-based generalization bound and demonstrate APW's effectiveness across tabular, vision, and language benchmarks, showing improved fairness metrics such as

, and

with modest accuracy trade-offs, and applicability to fine-tuning pretrained unfair models. The approach advances practical fairness by reducing the gap between training and test-time fairness, offering a principled, generalizable method for robust deployment in real-world decision systems.

Abstract

Paper Structure (23 sections, 4 theorems, 24 equations, 12 figures, 9 tables, 3 algorithms)

This paper contains 23 sections, 4 theorems, 24 equations, 12 figures, 9 tables, 3 algorithms.

Introduction
Related Work
Preliminaries
Group fairness
Subgroup weights
Generalization error
Proposed Method
Updating Rules for Adaptive Priority Reweighing
Extensions to Other Fairness Notions
Theoretical Analysis
Experiment Details
Datasets
Fairness-aware Algorithms
Experiment Settings
Experiment Analysis
...and 8 more sections

Key Result

proposition 1

Given the proportion $p_{y, a}$, and considering $wL\left(y, h\left(x, a\right)\right)$ to be upper bounded by $b$, we can state that for any $\delta > 0$, with a probability of at least $1 - \delta$, the following holds: where the Rademacher complexity $\Re(L \circ H)$ is defined by bartlett2002rademacher and $\sigma_1, \ldots, \sigma_m$ are i.i.d. Rademacher variables.

Figures (12)

Figure 1: Illustration of our adaptive priority reweighing method. Samples of different shapes indicate samples with different labels, different colors refer to different sensitive attributes, and darker color indicates higher weight. The dotted lines represent the decision boundaries for Empirical Risk Minimization(ERM) and Equal Reweighing, while the full line represents the decision boundary for Adaptive Priority Reweighing. In ERM, fairness is disregarded altogether; all samples have the same weight, which could easily lead to unfairness. In equal reweighing, (sub)groups are created and all points within one (sub)group are assigned the same weight, ensuring improved fairness on the training set. However, using equal reweighing methods induces many points clustered around the decision boundary, which limits the generalization of fair classifiers. In comparison, our method models the distance from the sample to the decision boundary, which improves the generalizability of fair classifiers.
Figure 2: Comparison of Training and Testing of Equal Opportunity Gap hardt2016equality($\Delta_{\mathrm{EOP}}$). We first use a histogram to illustrate the performance of six methods on the Equal Opportunity fairness metric on the training and test set, showcasing the range of $\Delta_{\mathrm{EOP}}$ using error bars. In the histogram on the right, the difference in model performance is measured by subtracting the performance metrics on the training sets from those on the test set. The generalizability issue for fairness measures becomes particularly salient for algorithms with commendable fairness performance, such as Fairbatch roh2020fairbatch and Label Bias Correction(LBC) jiang2020identifying. For example, these algorithms achieve considerable fairness performance on the training set. However, the same level of fairness performance does not necessarily translate to the test set. Our method demonstrates superior efficacy in ensuring fairness performance on both training and test sets, achieving optimal generalizability.
Figure 3: The trade-offs between accuracy and fairness measures on Adult. The dashed line represents the accuracy and fairness metrics corresponding to the baseline (LR). The upper-right corner of each diagram indicates optimal performance for both accuracy and fairness.
Figure 4: Results obtained for different $\alpha$ and different $\eta$ on the Adult test set w.r.t. equal opportunity.
Figure 5: Results obtained for different $\alpha$ and different $\eta$ on the COMPAS test set w.r.t. equal opportunity.
...and 7 more figures

Theorems & Definitions (6)

Definition 1: Fairness Definitions
Definition 2: Fairness Measures
proposition 1
theorem 1
theorem 2
theorem 3

Boosting Fair Classifier Generalization through Adaptive Priority Reweighing

TL;DR

Abstract

Boosting Fair Classifier Generalization through Adaptive Priority Reweighing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (6)