Table of Contents
Fetching ...

Fairness Risks for Group-conditionally Missing Demographics

Kaiqi Jiang, Wenzhe Fan, Mao Li, Xinhua Zhang

TL;DR

This work tackles fairness risks when demographic features are group-conditionally unavailable by introducing a probabilistic, semi-supervised approach that imputes missing demographics and labels within a differentiable fairness risk. It centers on Fair-SS-VAE, a variational autoencoder architecture that jointly models $A$ and $Y$ with group-conditioned missingness via $P(\tilde{A}|A)$ and a Monte Carlo estimator for the fairness risk $\\mathcal{E}$. A key methodological contribution is stopping the gradient through the demographic imputation to prevent reverse optimization of demographics for fairness, complemented by a separate SSL predictor for missing labels and a scalable $O(nN)$ Monte Carlo evaluation. Empirically, Fair-SS-VAE yields improved balance between accuracy and fairness (measured by $DEO$ and $DEOPP$) on CelebA and Adult across varying levels of missingness, demonstrating practical impact for privacy-conscious, fair ML deployments in vision and tabular domains.

Abstract

Fairness-aware classification models have gained increasing attention in recent years as concerns grow on discrimination against some demographic groups. Most existing models require full knowledge of the sensitive features, which can be impractical due to privacy, legal issues, and an individual's fear of discrimination. The key challenge we will address is the group dependency of the unavailability, e.g., people of some age range may be more reluctant to reveal their age. Our solution augments general fairness risks with probabilistic imputations of the sensitive features, while jointly learning the group-conditionally missing probabilities in a variational auto-encoder. Our model is demonstrated effective on both image and tabular datasets, achieving an improved balance between accuracy and fairness.

Fairness Risks for Group-conditionally Missing Demographics

TL;DR

This work tackles fairness risks when demographic features are group-conditionally unavailable by introducing a probabilistic, semi-supervised approach that imputes missing demographics and labels within a differentiable fairness risk. It centers on Fair-SS-VAE, a variational autoencoder architecture that jointly models and with group-conditioned missingness via and a Monte Carlo estimator for the fairness risk . A key methodological contribution is stopping the gradient through the demographic imputation to prevent reverse optimization of demographics for fairness, complemented by a separate SSL predictor for missing labels and a scalable Monte Carlo evaluation. Empirically, Fair-SS-VAE yields improved balance between accuracy and fairness (measured by and ) on CelebA and Adult across varying levels of missingness, demonstrating practical impact for privacy-conscious, fair ML deployments in vision and tabular domains.

Abstract

Fairness-aware classification models have gained increasing attention in recent years as concerns grow on discrimination against some demographic groups. Most existing models require full knowledge of the sensitive features, which can be impractical due to privacy, legal issues, and an individual's fear of discrimination. The key challenge we will address is the group dependency of the unavailability, e.g., people of some age range may be more reluctant to reveal their age. Our solution augments general fairness risks with probabilistic imputations of the sensitive features, while jointly learning the group-conditionally missing probabilities in a variational auto-encoder. Our model is demonstrated effective on both image and tabular datasets, achieving an improved balance between accuracy and fairness.
Paper Structure (40 sections, 2 theorems, 36 equations, 8 figures, 1 table)

This paper contains 40 sections, 2 theorems, 36 equations, 8 figures, 1 table.

Key Result

Theorem 1

Suppose $\mathcal{F} \in [0, C]$ where $C > 0$ is a constant. Then for all $\epsilon > 0$, As a result, to guarantee an estimation error of $\epsilon$ with confidence $1-\delta$, it suffices to draw $N = \frac{C^2}{2 \epsilon^2} \log \frac{1}{\delta}$ samples. The proof is available in Appendix sec:proof_sample_app.

Figures (8)

  • Figure 1: SS-VAE decoder and encoder with unavailable demographic/label conditioned on group/class
  • Figure 2: Pareto frontier of error versus DEO/DEOPP for Adult-Gender
  • Figure 3: Pareto frontier of error versus DEO/DEOPP for CelebA
  • Figure 4: Pareto frontier of error versus DEO/DEOPP for Adult-Race
  • Figure 5: Pareto frontier of error versus DEO/DEOPP for Adult-Gender
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2