Table of Contents
Fetching ...

A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results

Karima Makhlouf, Tamara Stefanovic, Heber H. Arcolezi, Catuscia Palamidessi

TL;DR

This work formalizes how local differential privacy via randomized response on the sensitive attribute affects fairness in ML. It derives bounds showing that conditional statistical disparity tends to shrink under LDP, and that statistical disparity often moves toward parity, especially when there are no proxies linking $X$ to the sensitive attribute, with potential reversals when proxies exist. Empirical validation on synthetic and real datasets corroborates the theoretical trends, while also highlighting scenarios (e.g., with proxies or non-ideal distributions) where privacy can shift or fail to reduce discrimination. The findings inform privacy‑aware model design by clarifying when LDP can improve fairness versus when it may introduce new biases, guiding future work on broader fairness metrics and multidimensional privacy.

Abstract

Machine learning (ML) algorithms rely primarily on the availability of training data, and, depending on the domain, these data may include sensitive information about the data providers, thus leading to significant privacy issues. Differential privacy (DP) is the predominant solution for privacy-preserving ML, and the local model of DP is the preferred choice when the server or the data collector are not trusted. Recent experimental studies have shown that local DP can impact ML prediction for different subgroups of individuals, thus affecting fair decision-making. However, the results are conflicting in the sense that some studies show a positive impact of privacy on fairness while others show a negative one. In this work, we conduct a systematic and formal study of the effect of local DP on fairness. Specifically, we perform a quantitative study of how the fairness of the decisions made by the ML model changes under local DP for different levels of privacy and data distributions. In particular, we provide bounds in terms of the joint distributions and the privacy level, delimiting the extent to which local DP can impact the fairness of the model. We characterize the cases in which privacy reduces discrimination and those with the opposite effect. We validate our theoretical findings on synthetic and real-world datasets. Our results are preliminary in the sense that, for now, we study only the case of one sensitive attribute, and only statistical disparity, conditional statistical disparity, and equal opportunity difference.

A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results

TL;DR

This work formalizes how local differential privacy via randomized response on the sensitive attribute affects fairness in ML. It derives bounds showing that conditional statistical disparity tends to shrink under LDP, and that statistical disparity often moves toward parity, especially when there are no proxies linking to the sensitive attribute, with potential reversals when proxies exist. Empirical validation on synthetic and real datasets corroborates the theoretical trends, while also highlighting scenarios (e.g., with proxies or non-ideal distributions) where privacy can shift or fail to reduce discrimination. The findings inform privacy‑aware model design by clarifying when LDP can improve fairness versus when it may introduce new biases, guiding future work on broader fairness metrics and multidimensional privacy.

Abstract

Machine learning (ML) algorithms rely primarily on the availability of training data, and, depending on the domain, these data may include sensitive information about the data providers, thus leading to significant privacy issues. Differential privacy (DP) is the predominant solution for privacy-preserving ML, and the local model of DP is the preferred choice when the server or the data collector are not trusted. Recent experimental studies have shown that local DP can impact ML prediction for different subgroups of individuals, thus affecting fair decision-making. However, the results are conflicting in the sense that some studies show a positive impact of privacy on fairness while others show a negative one. In this work, we conduct a systematic and formal study of the effect of local DP on fairness. Specifically, we perform a quantitative study of how the fairness of the decisions made by the ML model changes under local DP for different levels of privacy and data distributions. In particular, we provide bounds in terms of the joint distributions and the privacy level, delimiting the extent to which local DP can impact the fairness of the model. We characterize the cases in which privacy reduces discrimination and those with the opposite effect. We validate our theoretical findings on synthetic and real-world datasets. Our results are preliminary in the sense that, for now, we study only the case of one sensitive attribute, and only statistical disparity, conditional statistical disparity, and equal opportunity difference.
Paper Structure (23 sections, 3 theorems, 28 equations, 11 figures, 4 tables)

This paper contains 23 sections, 3 theorems, 28 equations, 11 figures, 4 tables.

Key Result

Lemma 4.1

Figures (11)

  • Figure 1: Our framework to assess the impact of LDP on the fairness of a ML model.
  • Figure 2: Causal graphs of the Synthetic Datasets.
  • Figure 3: Results for the synthetic dataset S1-S4, illustrating the impact of LDP on fairness (y-axis) for privacy level $\varepsilon$ (x-axis). Note that in S3 we have $X \not\!\perp\!\!\! \; \; A$ and the fairness measure SD is inverted after obfuscation. Also, EOD is inverted after obfuscation. This is because Assumption \ref{['assumption4']} is not verified in this dataset. S4 illustrates Yule's Association Paradox, a variant of the Simpson's paradox. The fairness values on the original data (no privacy) are the values for $\varepsilon=\infty$.
  • Figure 4: Results for the synthetic dataset S5. Note that EOD is also inverted here after obfuscation. Again, this is because Assumption \ref{['assumption4']} is not verified in this dataset.
  • Figure 5: Results for the synthetic dataset S6.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Definition 1: $\varepsilon$-Local Differential Privacy kairouz2016discrete
  • Lemma 4.1
  • Lemma 4.2: Quantification of SD$'$ in terms of the distribution on the original data
  • Theorem 4.1: Impact of LDP on SD. Case $X \perp A$