Differentially Private Post-Processing for Fair Regression
Ruicheng Xian, Qiaobo Li, Gautam Kamath, Han Zhao
TL;DR
The paper tackles privacy-preserving fairness for regression by proposing a post-processing pipeline that remaps outputs to satisfy statistical parity through privately estimated output distributions, a Wasserstein barycenter, and group-specific optimal transports. Privacy is achieved by privately estimating marginal PMFs with a histogram density estimator using the Laplace mechanism, followed by a re-normalization step, and the entire post-processing preserves DP via the DP post-processing theorem. A key insight is the bias-variance trade-off governed by the histogram bin count $k$: fewer bins improve fairness (via smaller KS deviation) but increase discretization error, while more bins reduce discretization error at the cost of higher variance and DP noise; theory shows an optimal scaling of $k$ with the sample size $n$ in MSE, and experiments on two public datasets illustrate the trade-offs. The approach decouples training from fairness, enabling the pre-trained regressor to be optimized for accuracy while the post-processing enforces fairness with principled guarantees, and it can be extended to approximate statistical parity and to attribute-blind settings in future work.
Abstract
This paper describes a differentially private post-processing algorithm for learning fair regressors satisfying statistical parity, addressing privacy concerns of machine learning models trained on sensitive data, as well as fairness concerns of their potential to propagate historical biases. Our algorithm can be applied to post-process any given regressor to improve fairness by remapping its outputs. It consists of three steps: first, the output distributions are estimated privately via histogram density estimation and the Laplace mechanism, then their Wasserstein barycenter is computed, and the optimal transports to the barycenter are used for post-processing to satisfy fairness. We analyze the sample complexity of our algorithm and provide fairness guarantee, revealing a trade-off between the statistical bias and variance induced from the choice of the number of bins in the histogram, in which using less bins always favors fairness at the expense of error.
