FairReweighing: Density Estimation-Based Reweighing Framework for Improving Separation in Fair Regression
Xiaoyin Xi, Zhe Yu
TL;DR
This work addresses fairness in regression by focusing on the separation criterion, defined as the conditional independence of predictions given the ground-truth outcome, $\hat{Y} \perp A \mid Y$, with the separation quantified via a continuous mutual information estimate $I[\hat{Y}; A \mid Y]$. The authors introduce FairReweighing, a density-estimation–based pre-processing method that reweights training data using $W(a,y) = \frac{\rho(a)\rho(y)}{\rho(a,y)}$ (computed via KDE or radius-neighborhood densities) to enforce separation before model fitting, and they extend mutual information-based separation metrics to continuous sensitive attributes using a regression-based density estimation approach, yielding $\hat{C}_{sep} = \frac{1}{n} \sum_{i=1}^n \log \frac{\rho(a_i \mid y_i, \hat{y}_i)}{\rho(a_i \mid y_i)}$. Theoretical guarantees hold under a conditional independence assumption, and extensive experiments on synthetic and real-world datasets show that FairReweighing improves separation more effectively than state-of-the-art regression fairness methods while maintaining high predictive accuracy. The work also demonstrates that FairReweighing collapses to the classical Reweighing method in binary classification, establishing it as a generalized pre-processing approach for both regression and classification tasks. This framework enables fair regression with continuous sensitive attributes and provides practical tools for auditing and mitigating bias in data-driven decision-making.
Abstract
There has been a prevalence of applying AI software in both high-stakes public-sector and industrial contexts. However, the lack of transparency has raised concerns about whether these data-informed AI software decisions secure fairness against people of all racial, gender, or age groups. Despite extensive research on emerging fairness-aware AI software, up to now most efforts to solve this issue have been dedicated to binary classification tasks. Fairness in regression is relatively underexplored. In this work, we adopted a mutual information-based metric to assess separation violations. The metric is also extended so that it can be directly applied to both classification and regression problems with both binary and continuous sensitive attributes. Inspired by the Reweighing algorithm in fair classification, we proposed a FairReweighing pre-processing algorithm based on density estimation to ensure that the learned model satisfies the separation criterion. Theoretically, we show that the proposed FairReweighing algorithm can guarantee separation in the training data under a data independence assumption. Empirically, on both synthetic and real-world data, we show that FairReweighing outperforms existing state-of-the-art regression fairness solutions in terms of improving separation while maintaining high accuracy.
