TowerDebias: A Novel Unfairness Removal Method Based on the Tower Property
Norman Matloff, Aditya Mittal
TL;DR
The paper addresses the challenge of removing sensitive-attribute influence from predictions produced by black-box models without retraining. It introduces towerDebias (tDB), a post-processing method that leverages the Tower Property to estimate $E(Y|X)$ by averaging $E(Y|X,S)$ over $S$, with a $k$-nearest-neighbors extension when exact matches on $X$ are unavailable. A formal fairness-improvement theorem and a closed-form expression for correlation reduction under a trivariate normal model are provided, along with an $L_2$-space interpretation of the method. Empirical results across regression and classification tasks demonstrate meaningful reductions in the Pearson correlation between predictions and sensitive attributes, with modest accuracy trade-offs and favorable comparisons to FairML variants, highlighting tDB’s broad applicability to real-world black-box systems.
Abstract
Decision-making processes have increasingly come to rely on sophisticated machine learning tools, raising critical concerns about the fairness of their predictions with respect to sensitive groups. The widespread adoption of commercial "black-box" models necessitates careful consideration of their legal and ethical implications for consumers. When users interact with such black-box models, a key challenge arises: how can the influence of sensitive attributes, such as race or gender, be mitigated or removed from its predictions? We propose towerDebias (tDB), a novel post-processing method designed to reduce the influence of sensitive attributes in predictions made by black-box models. Our tDB approach leverages the Tower Property from probability theory to improve prediction fairness without requiring retraining of the original model. This method is highly versatile, as it requires no prior knowledge of the original algorithm's internal structure and is adaptable to a diverse range of applications. We present a formal fairness improvement theorem for tDB and showcase its effectiveness in both regression and classification tasks using multiple real-world datasets.
