Fairness Risks for Group-conditionally Missing Demographics
Kaiqi Jiang, Wenzhe Fan, Mao Li, Xinhua Zhang
TL;DR
This work tackles fairness risks when demographic features are group-conditionally unavailable by introducing a probabilistic, semi-supervised approach that imputes missing demographics and labels within a differentiable fairness risk. It centers on Fair-SS-VAE, a variational autoencoder architecture that jointly models $A$ and $Y$ with group-conditioned missingness via $P(\tilde{A}|A)$ and a Monte Carlo estimator for the fairness risk $\\mathcal{E}$. A key methodological contribution is stopping the gradient through the demographic imputation to prevent reverse optimization of demographics for fairness, complemented by a separate SSL predictor for missing labels and a scalable $O(nN)$ Monte Carlo evaluation. Empirically, Fair-SS-VAE yields improved balance between accuracy and fairness (measured by $DEO$ and $DEOPP$) on CelebA and Adult across varying levels of missingness, demonstrating practical impact for privacy-conscious, fair ML deployments in vision and tabular domains.
Abstract
Fairness-aware classification models have gained increasing attention in recent years as concerns grow on discrimination against some demographic groups. Most existing models require full knowledge of the sensitive features, which can be impractical due to privacy, legal issues, and an individual's fear of discrimination. The key challenge we will address is the group dependency of the unavailability, e.g., people of some age range may be more reluctant to reveal their age. Our solution augments general fairness risks with probabilistic imputations of the sensitive features, while jointly learning the group-conditionally missing probabilities in a variational auto-encoder. Our model is demonstrated effective on both image and tabular datasets, achieving an improved balance between accuracy and fairness.
