Table of Contents
Fetching ...

Differentially Private Learning Needs Better Features (or Much More Data)

Florian Tramèr, Dan Boneh

TL;DR

This paper investigates differential privacy in vision tasks and finds that handcrafted priors, via ScatterNet features, yield substantially better privacy-utility than end-to-end private CNNs at moderate privacy budgets. It shows that linear models on ScatterNet features often outperform private deep models, and that deeper private learning helps but remains limited without stronger priors. The authors demonstrate two practical routes to reduce the DP-utility gap: collecting more private data or leveraging public data through transfer learning to obtain better features. They provide strong baselines, analyze convergence dynamics, and outline open problems, including faster convergence methods and federated DP, to guide future progress in private deep learning.

Abstract

We demonstrate that differentially private machine learning has not yet reached its "AlexNet moment" on many canonical vision tasks: linear models trained on handcrafted features significantly outperform end-to-end deep neural networks for moderate privacy budgets. To exceed the performance of handcrafted features, we show that private learning requires either much more private data, or access to features learned on public data from a similar domain. Our work introduces simple yet strong baselines for differentially private learning that can inform the evaluation of future progress in this area.

Differentially Private Learning Needs Better Features (or Much More Data)

TL;DR

This paper investigates differential privacy in vision tasks and finds that handcrafted priors, via ScatterNet features, yield substantially better privacy-utility than end-to-end private CNNs at moderate privacy budgets. It shows that linear models on ScatterNet features often outperform private deep models, and that deeper private learning helps but remains limited without stronger priors. The authors demonstrate two practical routes to reduce the DP-utility gap: collecting more private data or leveraging public data through transfer learning to obtain better features. They provide strong baselines, analyze convergence dynamics, and outline open problems, including faster convergence methods and federated DP, to guide future progress in private deep learning.

Abstract

We demonstrate that differentially private machine learning has not yet reached its "AlexNet moment" on many canonical vision tasks: linear models trained on handcrafted features significantly outperform end-to-end deep neural networks for moderate privacy budgets. To exceed the performance of handcrafted features, we show that private learning requires either much more private data, or access to features learned on public data from a similar domain. Our work introduces simple yet strong baselines for differentially private learning that can inform the evaluation of future progress in this area.

Paper Structure

This paper contains 40 sections, 2 theorems, 6 equations, 12 figures, 20 tables, 2 algorithms.

Key Result

Lemma B.3

Let $f : \mathcal{D} \to \mathcal{R}_1$ be $(\alpha, \varepsilon_1)$-RDP and $g : \mathcal{R}_1 \times \mathcal{D} \to \mathcal{R}_2$ be $(\alpha, \varepsilon_2)$-RDP, then the mechanism defined as $(X, Y)$, where $X \sim f(D)$ and $Y \sim g(X, D)$, satisfies $(\alpha, \varepsilon_1 + \varepsilon_2)

Figures (12)

  • Figure 1: Highest test accuracy achieved for each DP budget $(\varepsilon, \delta=10^{-5})$ for ScatterNet classifiers and the end-to-end CNNs of papernot2020tempered. We plot the mean and standard deviation across five runs.
  • Figure 2: Highest test accuracy achieved for each DP budget $(\varepsilon, \delta=10^{-5})$ for linear ScatterNet classifiers, CNNs on top of ScatterNet features, and end-to-end CNNs. Shows mean and standard deviation across five runs.
  • Figure 3: Convergence of DP-SGD with and without noise on CIFAR-10, for ScatterNet classifiers and end-to-end CNNs. (Left): low learning rate. (Right): high learning rate.
  • Figure 4: Number of trainable parameters of our models. For CIFAR-10, we consider two different end-to-end CNN architectures (see Appendix \ref{['apx:models']}), the smaller of which has approximately as many parameters as the linear ScatterNet model.
  • Figure 4: CIFAR-10 test accuracy for a training set of size $N$ and a DP budget of $(\varepsilon=3, \delta=1/2N)$. For $N > 50$K, we augment CIFAR-10 with pseudo-labeled Tiny Images collected by carmon2019unlabeled.
  • ...and 7 more figures

Theorems & Definitions (6)

  • Definition B.1: Rényi Divergence
  • Definition B.2: $(\alpha, \varepsilon)$-RDP mironov2017renyi
  • Lemma B.3: Adaptive composition of RDP mironov2019r
  • Lemma B.4: From RDP to $(\varepsilon, \delta)$-DP mironov2019r
  • Claim B.5
  • Claim D.1