Table of Contents
Fetching ...

On the Impact of Output Perturbation on Fairness in Binary Linear Classification

Vitalii Emelianov, Michaël Perrot

TL;DR

This work theoretically investigates how output-perturbation differential privacy affects fairness in binary linear classifiers. It derives high-probability bounds showing that privacy-induced changes in individual fairness grow with the model dimension as $O(\sigma\sqrt{p})$, while group fairness impacts are governed by the angular margin distribution and are, under certain conditions, dimension-free. The analysis centers on angular margins $\alpha(h,x,y)$ and uses Gaussian noise in the output perturbation to connect privacy randomness with fairness metrics, yielding bounds on expectation, variance, and high-probability deviations; the results extend to auditing settings and to Noisy-GD under plausible modeling assumptions. These findings offer principled guidance for evaluating and mitigating privacy–fairness trade-offs in practice, including applications to auditing private models and to optimization with noisy gradients, while outlining avenues for extending to non-linear and kernel-based regimes.

Abstract

We theoretically study how differential privacy interacts with both individual and group fairness in binary linear classification. More precisely, we focus on the output perturbation mechanism, a classic approach in privacy-preserving machine learning. We derive high-probability bounds on the level of individual and group fairness that the perturbed models can achieve compared to the original model. Hence, for individual fairness, we prove that the impact of output perturbation on the level of fairness is bounded but grows with the dimension of the model. For group fairness, we show that this impact is determined by the distribution of so-called angular margins, that is signed margins of the non-private model re-scaled by the norm of each example.

On the Impact of Output Perturbation on Fairness in Binary Linear Classification

TL;DR

This work theoretically investigates how output-perturbation differential privacy affects fairness in binary linear classifiers. It derives high-probability bounds showing that privacy-induced changes in individual fairness grow with the model dimension as , while group fairness impacts are governed by the angular margin distribution and are, under certain conditions, dimension-free. The analysis centers on angular margins and uses Gaussian noise in the output perturbation to connect privacy randomness with fairness metrics, yielding bounds on expectation, variance, and high-probability deviations; the results extend to auditing settings and to Noisy-GD under plausible modeling assumptions. These findings offer principled guidance for evaluating and mitigating privacy–fairness trade-offs in practice, including applications to auditing private models and to optimization with noisy gradients, while outlining avenues for extending to non-linear and kernel-based regimes.

Abstract

We theoretically study how differential privacy interacts with both individual and group fairness in binary linear classification. More precisely, we focus on the output perturbation mechanism, a classic approach in privacy-preserving machine learning. We derive high-probability bounds on the level of individual and group fairness that the perturbed models can achieve compared to the original model. Hence, for individual fairness, we prove that the impact of output perturbation on the level of fairness is bounded but grows with the dimension of the model. For group fairness, we show that this impact is determined by the distribution of so-called angular margins, that is signed margins of the non-private model re-scaled by the norm of each example.
Paper Structure (39 sections, 25 theorems, 63 equations, 4 figures)

This paper contains 39 sections, 25 theorems, 63 equations, 4 figures.

Key Result

Lemma 2.1

The output perturbation mechanism eq:noise model provides $(\varepsilon,\delta)$-differential privacy guarantees if and only if where $\Phi$ denotes the CDF of the standard normal random variable, $\Delta$ is the sensitivity of the non-private learning mechanism $\mathcal{M}$ defined as $\Delta = \sup_{D,D'} \|\mathcal{M}(D) - \mathcal{M}(D')\|_2,$ and $D,D'\in(\mathcal{X}\times\mathcal{S}\times\

Figures (4)

  • Figure 1: Individual fairness of perturbed models $h^\text{priv}$ on Adult dataset. The $99\%$-confidence bounds are shown by dashed lines, and color-filled regions correspond to regions where $99\%$ of measurements lie.
  • Figure 2: (a) Disagreement probability of the perturbed model $h^\text{priv}$ with the unperturbed model $h$ for a data point; (b) Disagreement ratio of perturbed models $h^\text{priv}$ with the unperturbed model $h$ on the Adult dataset. On panel (b), the $99\%$-confidence bounds are shown by dashed lines, and color-filled regions correspond to regions where $99\%$ of measurements lie.
  • Figure 3: Accuracy $\mathcal{A}$ and group fairness $\mathcal{F}_{k}$ (accuracy parity) of private models $h^\text{priv}$ for different values of $\varepsilon$ on Adult. The $99\%$-confidence bounds are shown by dashed and crossed lines. The color-filled regions are the ones where $99\%$ of measurements lie.
  • Figure 4: Accuracy, accuracy parity fairness measure, disagreement ratio and individual fairness of private models $h^\text{priv}$ for different values of $\varepsilon$ on Adult dataset. Different rows correspond to different values of random seeds ($1$, $2$, $3$, $4$). The $99\%$-confidence bounds are shown by dashed and crossed lines, and color-filled regions correspond to regions where $99\%$ of measurements lie.

Theorems & Definitions (37)

  • Lemma 2.1: balle18
  • Theorem 3.1
  • Lemma 4.1
  • Theorem 4.2: Disagreement ratio bound
  • Lemma 5.1: Expected fairness of output perturbation
  • Lemma 5.2: Variance of fairness of private models $h^\text{priv}$
  • Theorem 5.3
  • Theorem 5.4: mangold22
  • Lemma 6.1
  • Lemma 6.2
  • ...and 27 more