Table of Contents
Fetching ...

Deep Learning with Differential Privacy

Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang

TL;DR

This work demonstrates that deep neural networks can be trained with differential privacy at a modest privacy budget, addressing the challenge of non-convex optimization in large models. It introduces DP-SGD, combined with a novel moments accountant, to tightly bound cumulative privacy loss under realistic training regimes, and validates the approach on MNIST and CIFAR-10. Key contributions include a practical DP training pipeline, a tight accounting method that improves over strong composition, and techniques such as differentially private PCA and selective use of pre-trained convolutional layers to maintain model utility. The results establish a concrete privacy-utility trade-off for large-scale deep learning and highlight the practical potential of on-device private learning with manageable computational overhead.

Abstract

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

Deep Learning with Differential Privacy

TL;DR

This work demonstrates that deep neural networks can be trained with differential privacy at a modest privacy budget, addressing the challenge of non-convex optimization in large models. It introduces DP-SGD, combined with a novel moments accountant, to tightly bound cumulative privacy loss under realistic training regimes, and validates the approach on MNIST and CIFAR-10. Key contributions include a practical DP training pipeline, a tight accounting method that improves over strong composition, and techniques such as differentially private PCA and selective use of pre-trained convolutional layers to maintain model utility. The results establish a concrete privacy-utility trade-off for large-scale deep learning and highlight the practical potential of on-device private learning with manageable computational overhead.

Abstract

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

Paper Structure

This paper contains 24 sections, 7 theorems, 33 equations, 6 figures, 1 algorithm.

Key Result

Theorem 1

There exist constants $c_1$ and $c_2$ so that given the sampling probability $q=L/N$ and the number of steps $T$, for any $\varepsilon < c_1 q^2T$, Algorithm alg:privsgd is $(\varepsilon,\delta)$-differentially private for any $\delta>0$ if we choose

Figures (6)

  • Figure 1: Code snippet of DPSGD_Optimizer and DPTrain.
  • Figure 2: The $\varepsilon$ value as a function of epoch $E$ for $q=0.01$, $\sigma=4$, $\delta=10^{-5}$, using the strong composition theorem and the moments accountant respectively.
  • Figure 3: Results on the accuracy for different noise levels on the MNIST dataset. In all the experiments, the network uses $60$ dimension PCA projection, $1{,}000$ hidden units, and is trained using lot size $600$ and clipping threshold $4$. The noise levels $(\sigma, \sigma_p)$ for training the neural network and for PCA projection are set at ($8$, $16$), ($4$, $7$), and ($2$, $4$), respectively, for the three experiments.
  • Figure 4: Accuracy of various $(\varepsilon, \delta)$ privacy values on the MNIST dataset. Each curve corresponds to a different $\delta$ value.
  • Figure 5: MNIST accuracy when one parameter varies, and the others are fixed at reference values.
  • ...and 1 more figures

Theorems & Definitions (12)

  • definition thmcounterdefinition
  • Theorem 1
  • Theorem 2
  • Theorem 2
  • proof
  • Lemma 3
  • proof
  • Theorem 1
  • proof
  • Lemma 3.1
  • ...and 2 more