Table of Contents
Fetching ...

Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight

Tao Huang, Qingyu Huang, Xin Shi, Jiayang Meng, Guolong Zheng, Xu Yang, Xun Yi

TL;DR

An enhanced version of DP-SGD is introduced, named Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC), which replaces traditional clipping with non-monotonous adaptive gradient scaling, which alleviates the need for intensive threshold setting and rectifies the disproportionate weighting of smaller gradients.

Abstract

In the domain of deep learning, the challenge of protecting sensitive data while maintaining model utility is significant. Traditional Differential Privacy (DP) techniques such as Differentially Private Stochastic Gradient Descent (DP-SGD) typically employ strategies like direct or per-sample adaptive gradient clipping. These methods, however, compromise model accuracy due to their critical influence on gradient handling, particularly neglecting the significant contribution of small gradients during later training stages. In this paper, we introduce an enhanced version of DP-SGD, named Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC). This approach replaces traditional clipping with non-monotonous adaptive gradient scaling, which alleviates the need for intensive threshold setting and rectifies the disproportionate weighting of smaller gradients. Our contribution is twofold. First, we develop a novel gradient scaling technique that effectively assigns proper weights to gradients, particularly small ones, thus improving learning under differential privacy. Second, we integrate a momentum-based method into DP-PSASC to reduce bias from stochastic sampling, enhancing convergence rates. Our theoretical and empirical analyses confirm that DP-PSASC preserves privacy and delivers superior performance across diverse datasets, setting new standards for privacy-sensitive applications.

Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight

TL;DR

An enhanced version of DP-SGD is introduced, named Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC), which replaces traditional clipping with non-monotonous adaptive gradient scaling, which alleviates the need for intensive threshold setting and rectifies the disproportionate weighting of smaller gradients.

Abstract

In the domain of deep learning, the challenge of protecting sensitive data while maintaining model utility is significant. Traditional Differential Privacy (DP) techniques such as Differentially Private Stochastic Gradient Descent (DP-SGD) typically employ strategies like direct or per-sample adaptive gradient clipping. These methods, however, compromise model accuracy due to their critical influence on gradient handling, particularly neglecting the significant contribution of small gradients during later training stages. In this paper, we introduce an enhanced version of DP-SGD, named Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC). This approach replaces traditional clipping with non-monotonous adaptive gradient scaling, which alleviates the need for intensive threshold setting and rectifies the disproportionate weighting of smaller gradients. Our contribution is twofold. First, we develop a novel gradient scaling technique that effectively assigns proper weights to gradients, particularly small ones, thus improving learning under differential privacy. Second, we integrate a momentum-based method into DP-PSASC to reduce bias from stochastic sampling, enhancing convergence rates. Our theoretical and empirical analyses confirm that DP-PSASC preserves privacy and delivers superior performance across diverse datasets, setting new standards for privacy-sensitive applications.

Paper Structure

This paper contains 14 sections, 4 theorems, 10 equations, 9 figures, 3 tables.

Key Result

Lemma 1

Let $\mathcal{M}: \mathcal{D} \rightarrow \mathbb{R}^k$ be a function with $\ell_2$-sensitivity $\Delta_2 \mathcal{M} = \| \mathcal{M}(D) - \mathcal{M}(D^{\prime}) \|$ which measures the maximum change in the Euclidean norm of $\mathcal{M}$ for any two adjacent datasets $D$ and $D^{\prime}$ that dif

Figures (9)

  • Figure 1: Differentially Private Stochastic Gradient Descent (DP-SGD)
  • Figure 2: Gradient Norm Distribution In The Last 10 Epochs
  • Figure 3: Gradient Similarities(FashionMNIST)
  • Figure 4: Gradient Similarities(MNIST)
  • Figure 5: Gradient Similarities(CIFAR10)
  • ...and 4 more figures

Theorems & Definitions (7)

  • Definition 1: Lipschitz continuity
  • Definition 2: Smoothness
  • Definition 3: Differential privacy b8
  • Lemma 1: Gaussian Mechanism for Differential Privacy b8
  • Theorem 1: Privacy Guarantee of DP-PSASC
  • Theorem 2: Convergence Guarantee of DP-PSASC
  • Theorem 3: Convergence Guarantee of DP-PSASC with Momentum