Table of Contents
Fetching ...

Private and Fair Machine Learning: Revisiting the Disparate Impact of Differentially Private SGD

Lea Demelius, Dominik Kowald, Simone Kopeinik, Roman Kern, Andreas Trügler

TL;DR

This paper investigates how differential privacy via DPSGD affects fairness across multiple metrics and hyperparameter settings. It critically assesses the claim that optimizing hyperparameters directly for DP models can match the fairness of non-private models, revealing strong metric- and dataset-dependent variation and that tuning cannot reliably close the fairness gap. The authors also evaluate DPSGD-Global-Adapt, finding it not robust across hyperparameters, and discuss privacy leakage implications of hyperparameter tuning. Overall, the work highlights the need for careful metric selection, dataset-aware analysis, and development of private hyperparameter tuning methods to achieve private and fair ML in practice, with practical guidance on when DP tuning is beneficial and when it is not.

Abstract

Differential privacy (DP) is a prominent method for protecting information about individuals during data analysis. Training neural networks with differentially private stochastic gradient descent (DPSGD) influences the model's learning dynamics and, consequently, its output. This can affect the model's performance and fairness. While the majority of studies on the topic report a negative impact on fairness, it has recently been suggested that fairness levels comparable to non-private models can be achieved by optimizing hyperparameters for performance directly on differentially private models (rather than re-using hyperparameters from non-private models, as is common practice). In this work, we analyze the generalizability of this claim by 1) comparing the disparate impact of DPSGD on different performance metrics, and 2) analyzing it over a wide range of hyperparameter settings. We highlight that a disparate impact on one metric does not necessarily imply a disparate impact on another. Most importantly, we show that while optimizing hyperparameters directly on differentially private models does not mitigate the disparate impact of DPSGD reliably, it can still lead to improved utility-fairness trade-offs compared to re-using hyperparameters from non-private models. We stress, however, that any form of hyperparameter tuning entails additional privacy leakage, calling for careful considerations of how to balance privacy, utility and fairness. Finally, we extend our analyses to DPSGD-Global-Adapt, a variant of DPSGD designed to mitigate the disparate impact on accuracy, and conclude that this alternative may not be a robust solution with respect to hyperparameter choice.

Private and Fair Machine Learning: Revisiting the Disparate Impact of Differentially Private SGD

TL;DR

This paper investigates how differential privacy via DPSGD affects fairness across multiple metrics and hyperparameter settings. It critically assesses the claim that optimizing hyperparameters directly for DP models can match the fairness of non-private models, revealing strong metric- and dataset-dependent variation and that tuning cannot reliably close the fairness gap. The authors also evaluate DPSGD-Global-Adapt, finding it not robust across hyperparameters, and discuss privacy leakage implications of hyperparameter tuning. Overall, the work highlights the need for careful metric selection, dataset-aware analysis, and development of private hyperparameter tuning methods to achieve private and fair ML in practice, with practical guidance on when DP tuning is beneficial and when it is not.

Abstract

Differential privacy (DP) is a prominent method for protecting information about individuals during data analysis. Training neural networks with differentially private stochastic gradient descent (DPSGD) influences the model's learning dynamics and, consequently, its output. This can affect the model's performance and fairness. While the majority of studies on the topic report a negative impact on fairness, it has recently been suggested that fairness levels comparable to non-private models can be achieved by optimizing hyperparameters for performance directly on differentially private models (rather than re-using hyperparameters from non-private models, as is common practice). In this work, we analyze the generalizability of this claim by 1) comparing the disparate impact of DPSGD on different performance metrics, and 2) analyzing it over a wide range of hyperparameter settings. We highlight that a disparate impact on one metric does not necessarily imply a disparate impact on another. Most importantly, we show that while optimizing hyperparameters directly on differentially private models does not mitigate the disparate impact of DPSGD reliably, it can still lead to improved utility-fairness trade-offs compared to re-using hyperparameters from non-private models. We stress, however, that any form of hyperparameter tuning entails additional privacy leakage, calling for careful considerations of how to balance privacy, utility and fairness. Finally, we extend our analyses to DPSGD-Global-Adapt, a variant of DPSGD designed to mitigate the disparate impact on accuracy, and conclude that this alternative may not be a robust solution with respect to hyperparameter choice.

Paper Structure

This paper contains 18 sections, 4 equations, 15 figures, 15 tables.

Figures (15)

  • Figure 1: Results over all hyperparameter settings for the Adult dataset. A) shows accuracy and accuracy difference over all tested hyperparameter settings for the SGD and DPSGD models. Intervals shown correspond to ±1 standard deviation, reflecting variability across the 5 training runs. The results for the SGD model, represented by the solid blue line, are ordered by its accuracy. The dash-dot green line illustrates the DPSGD model with the same hyperparameters as the SGD model. The dashed orange line shows the results for the DPSGD model ordered by its own accuracy. Takeaway: As expected, hyperparameter settings that result in high accuracy for SGD do not necessarily do so for DPSGD. Interestingly, accuracy and accuracy difference are negatively correlated, i.e., hyperparameter settings that result in lower performance also result in lower fairness. B) summarizes how often DPSGD achieves better/similar/worse performance and is fairer/similarly fair/unfairer than the SGD model with the same hyperparameters. Takeaway: While for most hyperparameter settings DPSGD has a negative effect on both performance and fairness, there exist some settings for which DPSGD results in similar accuracy difference and similar or even better overall accuracy.
  • Figure 2: Results over all hyperparameter settings for the LSAC dataset (details explained in Fig. \ref{['fig:lineplot_adult']}). Takeaway: For this dataset, DPSGD results in slightly worse accuracy but similar accuracy difference than SGD for most hyperparameter settings.
  • Figure 3: Results over all hyperparameter settings for the Compas dataset (details explained in Fig. \ref{['fig:lineplot_adult']}). Takeaway: Choosing hyperparameters for DPSGD based on SGD accuracy leads to unpredictable accuracy and accuracy difference: While some hyperparameters work well for both, others exhibit considerably worse performance and fairness for DPSGD. In general, higher accuracy difference coincides with lower overall accuracy.
  • Figure 4: Results over all hyperparameter settings for the ACSEmployment dataset (details explained in Fig. \ref{['fig:lineplot_adult']}). Takeaway: Choosing hyperparameters for DPSGD based on SGD accuracy leads to more unpredictable accuracy difference than tuning on DPSGD itself. For most hyperparameter settings DPSGD results in worse performance and increased unfairness.
  • Figure 5: Results over all hyperparameter settings for the CelebA dataset (details explained in Fig. \ref{['fig:lineplot_adult']}). Takeaway: Again, higher overall accuracy correlates with better fairness. Which hyperparameters work best for SGD and DPSGD respectively varies significantly, however, settings which yield high accuracy for SGD tend to be comparably stable when applied to DPSGD, but still can lead to considerably less performance and fairness.
  • ...and 10 more figures