Fairness Under Demographic Scarce Regime

Patrik Joslin Kenfack; Samira Ebrahimi Kahou; Ulrich Aïvodji

Fairness Under Demographic Scarce Regime

Patrik Joslin Kenfack, Samira Ebrahimi Kahou, Ulrich Aïvodji

TL;DR

This work tackles fairness under a demographic scarce regime by using proxy-sensitive attributes learned from data where demographic labels are available in a source set but missing in the target. It introduces FairDSR, a two-phase framework: (i) training an uncertainty-aware attribute predictor via a self-ensembling, Monte Carlo dropout-based method to produce proxy attributes and their uncertainty, and (ii) enforcing fairness constraints only on samples with reliable proxy attributes. Empirical results across five real-world datasets show that applying fairness constraints to low-uncertainty samples yields significantly better fairness-accuracy tradeoffs than classic proxy methods and can outperform models trained with true sensitive attributes in several cases; uncertainty measures including conformal prediction also corroborate these findings. The approach highlights the critical role of uncertainty in the sensitive-attribute space for designing fair models when demographic information is incomplete, with practical implications for privacy-preserving and bias-aware deployments.

Abstract

Most existing works on fairness assume the model has full access to demographic information. However, there exist scenarios where demographic information is partially available because a record was not maintained throughout data collection or for privacy reasons. This setting is known as demographic scarce regime. Prior research has shown that training an attribute classifier to replace the missing sensitive attributes (proxy) can still improve fairness. However, using proxy-sensitive attributes worsens fairness-accuracy tradeoffs compared to true sensitive attributes. To address this limitation, we propose a framework to build attribute classifiers that achieve better fairness-accuracy tradeoffs. Our method introduces uncertainty awareness in the attribute classifier and enforces fairness on samples with demographic information inferred with the lowest uncertainty. We show empirically that enforcing fairness constraints on samples with uncertain sensitive attributes can negatively impact the fairness-accuracy tradeoff. Our experiments on five datasets showed that the proposed framework yields models with significantly better fairness-accuracy tradeoffs than classic attribute classifiers. Surprisingly, our framework can outperform models trained with fairness constraints on the true sensitive attributes in most benchmarks. We also show that these findings are consistent with other uncertainty measures such as conformal prediction.

Fairness Under Demographic Scarce Regime

TL;DR

Abstract

Paper Structure (40 sections, 9 equations, 16 figures, 8 tables)

This paper contains 40 sections, 9 equations, 16 figures, 8 tables.

Introduction
Related Work.
Problem Setting and Preliminaries.
Problem formulation.
Fairness Metrics.
Fairness Mechanism.
Method
Uncertainty-Aware Attribute Prediction
Student Model.
Teacher Model.
Uncertainty Estimation.
Enforcing Fairness w.r.t Reliable Proxy Sensitive Attributes
Experiments
Experimental Setup
Datasets.
...and 25 more sections

Figures (16)

Figure 1: Overview of FairDSR. Our framework consists of two steps. In the first step (left), the dataset $\mathcal{D}_2$ is used to train the attribute classifier for the student-teacher framework. The first step produces proxy-sensitive attributes ($h(X)=\hat{A}$) and the uncertainty of their predictions ($U$). In the second step (right), the fair model is trained using only samples with reliable proxy-sensitive attributes. These samples are selected based on a defined threshold of their uncertainties.
Figure 2: Training Random Forest classifiers without fairness constraints using samples with high uncertainty of sensitive attribute predictions. For each uncertainty threshold $H$, the model is trained on samples with uncertainty $\geq H$. The training is done seven times, and the average fairness (first row) and accuracy (second row) are reported. Shaded represents the standard deviation.
Figure 3: Accuracy-fairness tradeoffs for various fairness metrics ($\Delta_{\text{DP}}$, $\Delta_{\text{EOP}}$, $\Delta_{\text{EOD}}$) and proxy sensitive attributes. The top-left is the best (highest accuracy with the lowest unfairness). Curves are created by sweeping a range of fairness coefficients $\lambda$, taking the median of 7 runs per $\lambda$, and computing the Pareto front. The exponentiated gradient is the fairness mechanism with Random Forests as the base classifier. The standard deviations are shaded in the figures.
Figure 4: The impact of the uncertainty threshold $H$ on the fairness-accuracy tradeoff for (a) Adult and (b) Compas datasets.
Figure 5: Consistency loss study on the Adult dataset. The predicted sensitive attributes are obtained using our student model with and without consistency loss.
...and 11 more figures

Fairness Under Demographic Scarce Regime

TL;DR

Abstract

Fairness Under Demographic Scarce Regime

Authors

TL;DR

Abstract

Table of Contents

Figures (16)