Fairness Meets Privacy: Integrating Differential Privacy and Demographic Parity in Multi-class Classification
Lilian Say, Christophe Denis, Rafael Pinot
TL;DR
The paper tackles the challenge of simultaneously preserving data privacy and ensuring fairness in multi-class classification. It proposes DP2DP, a two-phase post-processing pipeline that first builds a differentially private probabilistic classifier on labeled data and then enforces $\rho$-demographic parity using unlabeled data with a privacy-preserving optimization. The authors establish a Rényi DP guarantee for DP2DP and prove a fairness bound showing the unfairness gap to $\rho$ decays at $\mathcal{O}(\log(N)/\sqrt{N})$ up to constants and smoothing error, aligning with non-private baselines up to a logarithmic factor. Empirically, DP2DP achieves state-of-the-art accuracy/fairness/privacy trade-offs on synthetic data and real-world datasets (notably the Adult benchmark), demonstrating that privacy and fairness can be integrated with only mild performance overhead and without sacrificing practical utility.
Abstract
The increasing use of machine learning in sensitive applications demands algorithms that simultaneously preserve data privacy and ensure fairness across potentially sensitive sub-populations. While privacy and fairness have each been extensively studied, their joint treatment remains poorly understood. Existing research often frames them as conflicting objectives, with multiple studies suggesting that strong privacy notions such as differential privacy inevitably compromise fairness. In this work, we challenge that perspective by showing that differential privacy can be integrated into a fairness-enhancing pipeline with minimal impact on fairness guarantees. We design a postprocessing algorithm, called DP2DP, that enforces both demographic parity and differential privacy. Our analysis reveals that our algorithm converges towards its demographic parity objective at essentially the same rate (up logarithmic factor) as the best non-private methods from the literature. Experiments on both synthetic and real datasets confirm our theoretical results, showing that the proposed algorithm achieves state-of-the-art accuracy/fairness/privacy trade-offs.
