Table of Contents
Fetching ...

Enhancing DPSGD via Per-Sample Momentum and Low-Pass Filtering

Xincheng Xu, Thilina Ranbaduge, Qing Wang, Thierry Rakotoarivelo, David Smith

TL;DR

This paper addresses the accuracy degradation of differentially private training with DPSGD by jointly reducing DP noise and clipping bias. It introduces DP-PMLF, which combines per-sample momentum for variance reduction with a post-processing linear low-pass filter to suppress high-frequency DP noise without extra privacy cost. Theoretical results establish improved convergence rates under DP and show that the low-pass filter and momentum terms reduce clipping bias and noise effects, respectively. Empirical evaluations on image and language tasks demonstrate a consistent privacy-utility improvement over state-of-the-art DPSGD variants, highlighting DP-PMLF's practical impact for private deep learning.

Abstract

Differentially Private Stochastic Gradient Descent (DPSGD) is widely used to train deep neural networks with formal privacy guarantees. However, the addition of differential privacy (DP) often degrades model accuracy by introducing both noise and bias. Existing techniques typically address only one of these issues, as reducing DP noise can exacerbate clipping bias and vice-versa. In this paper, we propose a novel method, \emph{DP-PMLF}, which integrates per-sample momentum with a low-pass filtering strategy to simultaneously mitigate DP noise and clipping bias. Our approach uses per-sample momentum to smooth gradient estimates prior to clipping, thereby reducing sampling variance. It further employs a post-processing low-pass filter to attenuate high-frequency DP noise without consuming additional privacy budget. We provide a theoretical analysis demonstrating an improved convergence rate under rigorous DP guarantees, and our empirical evaluations reveal that DP-PMLF significantly enhances the privacy-utility trade-off compared to several state-of-the-art DPSGD variants.

Enhancing DPSGD via Per-Sample Momentum and Low-Pass Filtering

TL;DR

This paper addresses the accuracy degradation of differentially private training with DPSGD by jointly reducing DP noise and clipping bias. It introduces DP-PMLF, which combines per-sample momentum for variance reduction with a post-processing linear low-pass filter to suppress high-frequency DP noise without extra privacy cost. Theoretical results establish improved convergence rates under DP and show that the low-pass filter and momentum terms reduce clipping bias and noise effects, respectively. Empirical evaluations on image and language tasks demonstrate a consistent privacy-utility improvement over state-of-the-art DPSGD variants, highlighting DP-PMLF's practical impact for private deep learning.

Abstract

Differentially Private Stochastic Gradient Descent (DPSGD) is widely used to train deep neural networks with formal privacy guarantees. However, the addition of differential privacy (DP) often degrades model accuracy by introducing both noise and bias. Existing techniques typically address only one of these issues, as reducing DP noise can exacerbate clipping bias and vice-versa. In this paper, we propose a novel method, \emph{DP-PMLF}, which integrates per-sample momentum with a low-pass filtering strategy to simultaneously mitigate DP noise and clipping bias. Our approach uses per-sample momentum to smooth gradient estimates prior to clipping, thereby reducing sampling variance. It further employs a post-processing low-pass filter to attenuate high-frequency DP noise without consuming additional privacy budget. We provide a theoretical analysis demonstrating an improved convergence rate under rigorous DP guarantees, and our empirical evaluations reveal that DP-PMLF significantly enhances the privacy-utility trade-off compared to several state-of-the-art DPSGD variants.

Paper Structure

This paper contains 24 sections, 9 theorems, 100 equations, 7 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

Let $\mathcal{M}: \mathcal{D} \to \mathcal{R}^d$ be an $(\epsilon,\delta)$-DP mechanism and let $\mathcal{H}: \mathcal{R}^d \to \mathcal{R}^d$ be any deterministic or randomized function. Then the composition $\mathcal{H} \circ \mathcal{M}$ satisfies $(\epsilon,\delta)$-DP.

Figures (7)

  • Figure 1: Test accuracy (%) comparison of DPSGD and two existing methods on CIFAR-10 with a 5-Layer CNN over 100 epochs under different privacy budgets ($\epsilon$).
  • Figure 2: Test accuracy (%) comparison across different models on CIFAR-10 with fixed privacy budget $\epsilon =1.$
  • Figure 3: Test accuracy (%) for DP-PMLF and its two variants, which are DP-PMLF without Per-sample Momentum (DP-PMLF (w/o PM)) and DP-PMLF without Low-pass Filter (DP-PMLF (w/o LF)).
  • Figure 4: Test accuracy (%) comparison across datasets on CNN-5 with fixed epoch (Epoch = 25 for MNIST and Fashion-MNIST, Epoch = 50 for CIFAR-10 and CIFAR-100) and different privacy budgets $\epsilon \in [1, 8].$
  • Figure 5: Test Accuracy (%) Comparison Across Different Datasets on Resnet-18 with Fixed Privacy Budget $\epsilon =1.$
  • ...and 2 more figures

Theorems & Definitions (19)

  • Definition 1: Differential Privacy (DP) dwork2006
  • Definition 2: Global Sensitivity
  • Definition 3: Gaussian Mechanism dwork2014
  • Lemma 1: Post-Processing dwork2006
  • Lemma 2: Effectiveness of Low-pass Filter
  • Lemma 3: Bounded Momentum Variance
  • Remark
  • Theorem 1: Convergence Bound
  • Remark
  • Corollary 1
  • ...and 9 more