Table of Contents
Fetching ...

DPDR: Gradient Decomposition and Reconstruction for Differentially Private Deep Learning

Yixuan Liu, Li Xiong, Yuhan Liu, Yujie Gu, Ruixuan Liu, Hong Chen

TL;DR

DPDR tackles the privacy-utility trade-off in DP-SGD by separating gradients into a reusable common-knowledge direction and an incremental component, and then privatizing mainly the incremental part. By applying a Gradient Decomposition and Reconstruction (GDR) at early training steps and leveraging a mixed strategy to switch to DP-SGD later, DPDR achieves higher information gain per unit privacy budget and faster convergence. The authors provide formal privacy guarantees, non-convex convergence analysis, and extensive experiments showing DPDR improves accuracy and convergence over DP-SGD and related methods across multiple datasets and models. This approach offers a principled way to exploit gradient coherence in private deep learning, with practical implications for privacy-preserving training in real-world systems.

Abstract

Differentially Private Stochastic Gradients Descent (DP-SGD) is a prominent paradigm for preserving privacy in deep learning. It ensures privacy by perturbing gradients with random noise calibrated to their entire norm at each training step. However, this perturbation suffers from a sub-optimal performance: it repeatedly wastes privacy budget on the general converging direction shared among gradients from different batches, which we refer as common knowledge, yet yields little information gain. Motivated by this, we propose a differentially private training framework with early gradient decomposition and reconstruction (DPDR), which enables more efficient use of the privacy budget. In essence, it boosts model utility by focusing on incremental information protection and recycling the privatized common knowledge learned from previous gradients at early training steps. Concretely, DPDR incorporates three steps. First, it disentangles common knowledge and incremental information in current gradients by decomposing them based on previous noisy gradients. Second, most privacy budget is spent on protecting incremental information for higher information gain. Third, the model is updated with the gradient reconstructed from recycled common knowledge and noisy incremental information. Theoretical analysis and extensive experiments show that DPDR outperforms state-of-the-art baselines on both convergence rate and accuracy.

DPDR: Gradient Decomposition and Reconstruction for Differentially Private Deep Learning

TL;DR

DPDR tackles the privacy-utility trade-off in DP-SGD by separating gradients into a reusable common-knowledge direction and an incremental component, and then privatizing mainly the incremental part. By applying a Gradient Decomposition and Reconstruction (GDR) at early training steps and leveraging a mixed strategy to switch to DP-SGD later, DPDR achieves higher information gain per unit privacy budget and faster convergence. The authors provide formal privacy guarantees, non-convex convergence analysis, and extensive experiments showing DPDR improves accuracy and convergence over DP-SGD and related methods across multiple datasets and models. This approach offers a principled way to exploit gradient coherence in private deep learning, with practical implications for privacy-preserving training in real-world systems.

Abstract

Differentially Private Stochastic Gradients Descent (DP-SGD) is a prominent paradigm for preserving privacy in deep learning. It ensures privacy by perturbing gradients with random noise calibrated to their entire norm at each training step. However, this perturbation suffers from a sub-optimal performance: it repeatedly wastes privacy budget on the general converging direction shared among gradients from different batches, which we refer as common knowledge, yet yields little information gain. Motivated by this, we propose a differentially private training framework with early gradient decomposition and reconstruction (DPDR), which enables more efficient use of the privacy budget. In essence, it boosts model utility by focusing on incremental information protection and recycling the privatized common knowledge learned from previous gradients at early training steps. Concretely, DPDR incorporates three steps. First, it disentangles common knowledge and incremental information in current gradients by decomposing them based on previous noisy gradients. Second, most privacy budget is spent on protecting incremental information for higher information gain. Third, the model is updated with the gradient reconstructed from recycled common knowledge and noisy incremental information. Theoretical analysis and extensive experiments show that DPDR outperforms state-of-the-art baselines on both convergence rate and accuracy.
Paper Structure (20 sections, 4 theorems, 23 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 4 theorems, 23 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

Let $M: \mathcal{X}^n \rightarrow \mathcal{R}$ be a randomized algorithm that satisfies $(\epsilon, \delta)$-DP, $f: \mathcal{R} \rightarrow \mathbb{R'}$ be an arbitrary function. Then $f \circ M : \mathcal{X}^n \rightarrow \mathbb{R'}$ is also $(\epsilon, \delta)$-DP.

Figures (6)

  • Figure 1: Left: SGD Visualization on linear regression model. Gradient directions are similar (coherent) at the early training stage, and fluctuate (stale) later. Middle: In subtraction, incremental information is gradient difference $\Delta g=g_t-\tilde{g}_{t-1}$. In decomposition, incremental information is orthogonal gradient $g_\perp= g_t-\alpha\cdot b$, where $b$ is normalized ${g_{t-1}}$, parallel coefficient $\alpha=\langle g_t, b \rangle/\Vert b \Vert^2$. By Pythagorean Theorem, $\Delta g \leq g_\perp$. Right: Norm of gradients on CIFAR10. At early stages, gradient norm fluctuates while orthogonal norm stays small and stable, which indicates the portion of common knowledge (green slash) is high compared to incremental information (orange range).
  • Figure 2: Framework of DPDR. First, it decompose current gradient $g_t$ into $g_\perp$ (incremental information) and $\alpha \cdot b$ by directional decomposition based on previous normalized noisy gradients $b$ (common knowledge). The parallel coefficient $\alpha$ and $g_\perp$ are perturbed for further reconstruction with $b$. Model is updated based on reconstructed gradient
  • Figure 3: The norm of difference may be larger than the original gradient. CIFAR10.
  • Figure 4: Distribution of orthogonal components of gradient are more concentrated compared to difference. CIFAR10.
  • Figure 5: Convergence Evaluation on CIFAR-10 with 5-layer CNN.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Definition 1: Differential Privacy
  • Definition 2: Sensitivity
  • Lemma 1: Post-processing
  • Theorem 1: Privacy Guarantee of Algorithm \ref{['algo']}
  • Lemma 2: Convergence without clipping bias
  • Theorem 2: Convergence with clipping threshold
  • Proof 1