Table of Contents
Fetching ...

Deep Learning with Data Privacy via Residual Perturbation

Wenqi Tao, Huaming Ling, Zuoqiang Shi, Bao Wang

TL;DR

The paper tackles data privacy in deep learning by introducing residual perturbation, a Gaussian noise injection scheme applied at every residual mapping in ResNets and grounded in stochastic differential equation theory. By analyzing two SDE‑based strategies, the authors prove differential privacy guarantees and show a reduction in the generalization gap, while also achieving competitive or superior utility compared with DPSGD and enabling efficient training. They demonstrate through extensive experiments on IDC, MNIST, CIFAR10, and CIFAR100 that residual perturbation improves membership privacy (attacks approach random guessing) and can boost accuracy via model ensembles, with skip connections playing a crucial role. The work provides both theoretical DP/RDP results and practical insights into privacy‑utility tradeoffs, highlighting residual perturbation as a feasible path to private, accurate deep learning, albeit with opportunities for tighter DP bounds in future work.

Abstract

Protecting data privacy in deep learning (DL) is of crucial importance. Several celebrated privacy notions have been established and used for privacy-preserving DL. However, many existing mechanisms achieve privacy at the cost of significant utility degradation and computational overhead. In this paper, we propose a stochastic differential equation-based residual perturbation for privacy-preserving DL, which injects Gaussian noise into each residual mapping of ResNets. Theoretically, we prove that residual perturbation guarantees differential privacy (DP) and reduces the generalization gap of DL. Empirically, we show that residual perturbation is computationally efficient and outperforms the state-of-the-art differentially private stochastic gradient descent (DPSGD) in utility maintenance without sacrificing membership privacy.

Deep Learning with Data Privacy via Residual Perturbation

TL;DR

The paper tackles data privacy in deep learning by introducing residual perturbation, a Gaussian noise injection scheme applied at every residual mapping in ResNets and grounded in stochastic differential equation theory. By analyzing two SDE‑based strategies, the authors prove differential privacy guarantees and show a reduction in the generalization gap, while also achieving competitive or superior utility compared with DPSGD and enabling efficient training. They demonstrate through extensive experiments on IDC, MNIST, CIFAR10, and CIFAR100 that residual perturbation improves membership privacy (attacks approach random guessing) and can boost accuracy via model ensembles, with skip connections playing a crucial role. The work provides both theoretical DP/RDP results and practical insights into privacy‑utility tradeoffs, highlighting residual perturbation as a feasible path to private, accurate deep learning, albeit with opportunities for tighter DP bounds in future work.

Abstract

Protecting data privacy in deep learning (DL) is of crucial importance. Several celebrated privacy notions have been established and used for privacy-preserving DL. However, many existing mechanisms achieve privacy at the cost of significant utility degradation and computational overhead. In this paper, we propose a stochastic differential equation-based residual perturbation for privacy-preserving DL, which injects Gaussian noise into each residual mapping of ResNets. Theoretically, we prove that residual perturbation guarantees differential privacy (DP) and reduces the generalization gap of DL. Empirically, we show that residual perturbation is computationally efficient and outperforms the state-of-the-art differentially private stochastic gradient descent (DPSGD) in utility maintenance without sacrificing membership privacy.
Paper Structure (46 sections, 10 theorems, 39 equations, 12 figures, 5 tables, 1 algorithm)

This paper contains 46 sections, 10 theorems, 39 equations, 12 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Assume the input to ResNet lies in $B(\mathbf{0}, R)$ and the expectation of the output of every residual mapping is normally distributed and bounded by a constant $G$, in $\ell_2$ norm. Given the total number of iterations $T$ used for training ResNet. For any $\epsilon>0$ and $\delta,\lambda\in(0,

Figures (12)

  • Figure 1: Illustrations of the forward and backward propagation of the training data using 2D ODE in equation \ref{['Algorithm:Eq2']} and SDE in equation \ref{['Algorithm:Eq4']}. (a) the original image; (b) and (d) the features of the original image generated by the forward propagation using ODE and SDE, respectively; (c) and (e) the recovered images by reverse-engineering the features shown in (b) and (d), respectively. We see that it is easy to break the privacy of the ODE model but harder for SDE.
  • Figure 2: Visualization of a few selected images from the IDC dataset.
  • Figure 3: Performance of residual perturbation (Strategy I) for En$_5$ResNet8 with different noise coefficients ($\gamma$) and membership inference attack thresholds on the IDC dataset. Residual perturbation significantly improves membership privacy and reduces the generalization gap. $\gamma=0$ corresponding to the baseline ResNet8. (Unit: %)
  • Figure 4: Performance of En$_5$ResNet8 with residual perturbation (Strategy I) using different noise coefficients ($\gamma$) and membership inference attack threshold on CIFAR10. Residual perturbation can not only enhance the membership privacy, but also improve the classification accuracy. $\gamma=0$ corresponding to the baseline ResNet8 without residual perturbation or model ensemble. (Unit: %)
  • Figure 5: Performance of En$_5$ResNet8 with residual perturbation (Strategy I) using different noise coefficients ($\gamma$) and membership inference attack threshold on CIFAR100. Again, residual perturbation can not only enhance the membership privacy, but also improve the classification accuracy. $\gamma=0$ corresponding to the baseline ResNet8 without residual perturbation or model ensemble. (Unit: %)
  • ...and 7 more figures

Theorems & Definitions (17)

  • Definition 1: $(\epsilon,\delta)$-DP
  • Theorem 1
  • Theorem 2
  • Definition 2
  • Theorem 3
  • Definition 3
  • Definition 4
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • ...and 7 more