Table of Contents
Fetching ...

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang

TL;DR

DP-SGD provides a single worst-case privacy guarantee, which can mask significant variation in privacy risk across individual training examples. The paper introduces output-specific $(\varepsilon({\mathbb A},{\bm d}),\delta)$-DP to capture per-example privacy along the observed training trajectory and develops an efficient estimator that uses periodically updated gradient norms and gradient-norm rounding to keep computation tractable. Empirically, most examples exhibit stronger privacy than the worst-case bound, and per-example privacy correlates with final training loss, implying that groups with worse utility also bear higher privacy costs. The results reveal substantial disparities in privacy across data groups and connect privacy with empirical privacy risks, emphasizing the need for careful privacy accounting and fairness considerations in private deep learning.

Abstract

Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose output-specific $(\varepsilon,δ)$-DP to characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We also design an efficient algorithm to investigate individual privacy across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. We further discover that the training loss and the privacy parameter of an example are well-correlated. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees. For example, on CIFAR-10, the average $\varepsilon$ of the class with the lowest test accuracy is 44.2\% higher than that of the class with the highest accuracy.

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

TL;DR

DP-SGD provides a single worst-case privacy guarantee, which can mask significant variation in privacy risk across individual training examples. The paper introduces output-specific -DP to capture per-example privacy along the observed training trajectory and develops an efficient estimator that uses periodically updated gradient norms and gradient-norm rounding to keep computation tractable. Empirically, most examples exhibit stronger privacy than the worst-case bound, and per-example privacy correlates with final training loss, implying that groups with worse utility also bear higher privacy costs. The results reveal substantial disparities in privacy across data groups and connect privacy with empirical privacy risks, emphasizing the need for careful privacy accounting and fairness considerations in private deep learning.

Abstract

Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose output-specific -DP to characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We also design an efficient algorithm to investigate individual privacy across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. We further discover that the training loss and the privacy parameter of an example are well-correlated. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees. For example, on CIFAR-10, the average of the class with the lowest test accuracy is 44.2\% higher than that of the class with the highest accuracy.
Paper Structure (26 sections, 5 theorems, 12 equations, 14 figures, 3 tables, 3 algorithms)

This paper contains 26 sections, 5 theorems, 12 equations, 14 figures, 3 tables, 3 algorithms.

Key Result

Theorem 3.1

Let $\{\theta_{1},\ldots,\theta_{t-1}\}$ be the observed models at step $t$. Suppose we run Algorithm alg:main_algo with $K=1$ and without rounding, then Algorithm alg:main_algo satisfies $(o^{(i)}_{\alpha} + \frac{\log(1/\delta)}{\alpha-1}, \delta)$-output-specific individual DP for the $i_{th}$ ex

Figures (14)

  • Figure 1: Individual privacy parameters of models trained by DP-SGD. The value of $\delta$ is $1\times 10^{-5}$. The dashed lines indicate $30\%$, $50\%$, and $70\%$ of datapoints. The black solid line shows the worst-case privacy parameter.
  • Figure 2: Accuracy and average $\varepsilon$ of different groups on CIFAR-10 and UTK-Face. Groups with worse accuracy also have worse privacy in general.
  • Figure 3: Median of gradient norms of different classes when training a ResNet-20 model on CIFAR-10.
  • Figure 4: Privacy parameters based on estimations of individual gradient norms ($\varepsilon$) versus those based on exact ones ($\acute{\varepsilon}$). The value of $\gamma$ denotes the number of updates of full gradient norms per epoch. The horizontal line shows the worst-case privacy guarantee.
  • Figure 5: Privacy parameters and final training losses. Each point shows the final training loss and privacy parameter of one example. Pearson's $r$ is computed between privacy parameters and log loss values.
  • ...and 9 more figures

Theorems & Definitions (12)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 3.1
  • Remark 1
  • Theorem 3.2
  • proof
  • Theorem A.1
  • Lemma A.1
  • ...and 2 more