Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis
Enze Shi, Pankaj Bhagwat, Zhixian Yang, Linglong Kong, Bei Jiang
TL;DR
This work presents a representation-centered fairness framework built on sufficient dimension reduction to balance predictive utility and bias mitigation. By decomposing the feature space into Y-relevant, Z-sensitive, and shared subspaces and progressively removing shared information, the method achieves controlled fairness without sacrificing accuracy. The authors provide a utility–fairness decomposition, an influence-function-based analysis of estimator behavior, and an algorithmic pipeline for sequential post-SDR training, supported by simulations and real-data experiments. The results show robust improvements in fairness metrics with competitive predictive performance, highlighting practical applicability in high-stakes domains. A key contribution is the explicit linking of SDR-based subspace manipulation with asymptotic guarantees and measurable fairness through distance covariance.
Abstract
Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing underlying biases in data representations. In this work, we propose a principled framework that adjusts data representations to balance predictive utility and fairness. Using sufficient dimension reduction, we decompose the feature space into target-relevant, sensitive, and shared components, and control the fairness-utility trade-off by selectively removing sensitive information. We provide a theoretical analysis of how prediction error and fairness gaps evolve as shared subspaces are added, and employ influence functions to quantify their effects on the asymptotic behavior of parameter estimates. Experiments on both synthetic and real-world datasets validate our theoretical insights and show that the proposed method effectively improves fairness while preserving predictive performance.
