Fairness Risks for Group-conditionally Missing Demographics

Kaiqi Jiang; Wenzhe Fan; Mao Li; Xinhua Zhang

Fairness Risks for Group-conditionally Missing Demographics

Kaiqi Jiang, Wenzhe Fan, Mao Li, Xinhua Zhang

TL;DR

This work tackles fairness risks when demographic features are group-conditionally unavailable by introducing a probabilistic, semi-supervised approach that imputes missing demographics and labels within a differentiable fairness risk. It centers on Fair-SS-VAE, a variational autoencoder architecture that jointly models $A$ and $Y$ with group-conditioned missingness via $P(\tilde{A}|A)$ and a Monte Carlo estimator for the fairness risk $\\mathcal{E}$. A key methodological contribution is stopping the gradient through the demographic imputation to prevent reverse optimization of demographics for fairness, complemented by a separate SSL predictor for missing labels and a scalable $O(nN)$ Monte Carlo evaluation. Empirically, Fair-SS-VAE yields improved balance between accuracy and fairness (measured by $DEO$ and $DEOPP$) on CelebA and Adult across varying levels of missingness, demonstrating practical impact for privacy-conscious, fair ML deployments in vision and tabular domains.

Abstract

Fairness-aware classification models have gained increasing attention in recent years as concerns grow on discrimination against some demographic groups. Most existing models require full knowledge of the sensitive features, which can be impractical due to privacy, legal issues, and an individual's fear of discrimination. The key challenge we will address is the group dependency of the unavailability, e.g., people of some age range may be more reluctant to reveal their age. Our solution augments general fairness risks with probabilistic imputations of the sensitive features, while jointly learning the group-conditionally missing probabilities in a variational auto-encoder. Our model is demonstrated effective on both image and tabular datasets, achieving an improved balance between accuracy and fairness.

Fairness Risks for Group-conditionally Missing Demographics

TL;DR

and

with group-conditioned missingness via

and a Monte Carlo estimator for the fairness risk

. A key methodological contribution is stopping the gradient through the demographic imputation to prevent reverse optimization of demographics for fairness, complemented by a separate SSL predictor for missing labels and a scalable

Monte Carlo evaluation. Empirically, Fair-SS-VAE yields improved balance between accuracy and fairness (measured by

and

) on CelebA and Adult across varying levels of missingness, demonstrating practical impact for privacy-conscious, fair ML deployments in vision and tabular domains.

Abstract

Paper Structure (40 sections, 2 theorems, 36 equations, 8 figures, 1 table)

This paper contains 40 sections, 2 theorems, 36 equations, 8 figures, 1 table.

Introduction
Related Work
Preliminary
Classification risk.
Fairness with Group-Conditionally Unavailable Demographics
Rationalizing semi-supervised fairness risk
Imputation of unavailable training labels
Efficient evaluation of fairness risk $\mathcal{E}$
Differentiation of vanilla fairness risk
Integrating Fairness Risk with SSL
Why VAE?
Encoders and decoders
Instilling Fairness to SS-VAE
Classifying test data.
Experimental Results
...and 25 more sections

Key Result

Theorem 1

Suppose $\mathcal{F} \in [0, C]$ where $C > 0$ is a constant. Then for all $\epsilon > 0$, As a result, to guarantee an estimation error of $\epsilon$ with confidence $1-\delta$, it suffices to draw $N = \frac{C^2}{2 \epsilon^2} \log \frac{1}{\delta}$ samples. The proof is available in Appendix sec:proof_sample_app.

Figures (8)

Figure 1: SS-VAE decoder and encoder with unavailable demographic/label conditioned on group/class
Figure 2: Pareto frontier of error versus DEO/DEOPP for Adult-Gender
Figure 3: Pareto frontier of error versus DEO/DEOPP for CelebA
Figure 4: Pareto frontier of error versus DEO/DEOPP for Adult-Race
Figure 5: Pareto frontier of error versus DEO/DEOPP for Adult-Gender
...and 3 more figures

Theorems & Definitions (2)

Theorem 1
Theorem 2

Fairness Risks for Group-conditionally Missing Demographics

TL;DR

Abstract

Fairness Risks for Group-conditionally Missing Demographics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)