On the Impact of Output Perturbation on Fairness in Binary Linear Classification
Vitalii Emelianov, Michaël Perrot
TL;DR
This work theoretically investigates how output-perturbation differential privacy affects fairness in binary linear classifiers. It derives high-probability bounds showing that privacy-induced changes in individual fairness grow with the model dimension as $O(\sigma\sqrt{p})$, while group fairness impacts are governed by the angular margin distribution and are, under certain conditions, dimension-free. The analysis centers on angular margins $\alpha(h,x,y)$ and uses Gaussian noise in the output perturbation to connect privacy randomness with fairness metrics, yielding bounds on expectation, variance, and high-probability deviations; the results extend to auditing settings and to Noisy-GD under plausible modeling assumptions. These findings offer principled guidance for evaluating and mitigating privacy–fairness trade-offs in practice, including applications to auditing private models and to optimization with noisy gradients, while outlining avenues for extending to non-linear and kernel-based regimes.
Abstract
We theoretically study how differential privacy interacts with both individual and group fairness in binary linear classification. More precisely, we focus on the output perturbation mechanism, a classic approach in privacy-preserving machine learning. We derive high-probability bounds on the level of individual and group fairness that the perturbed models can achieve compared to the original model. Hence, for individual fairness, we prove that the impact of output perturbation on the level of fairness is bounded but grows with the dimension of the model. For group fairness, we show that this impact is determined by the distribution of so-called angular margins, that is signed margins of the non-private model re-scaled by the norm of each example.
