Table of Contents
Fetching ...

DP-SGD with weight clipping

Antoine Barczewski, Jan Ramon

TL;DR

This paper addresses private training with differential privacy by moving beyond gradient clipping to a Lipschitz-based sensitivity bound. It introduces Lip-DP-SGD, which enforces Lipschitz constraints via ClipWeights and per-layer sensitivity estimates, enabling noise to be scaled without clipping biases. The approach delivers state-of-the-art accuracy under DP on image and tabular datasets and provides an open-source Lip-DP-SGD toolkit built on PyTorch/Opacus. The work demonstrates that weight clipping and Lipschitz control can substantially improve the privacy-utility trade-off and offers practical guidance for private deep learning. Overall, Lip-DP-SGD advances private training by reducing bias and leveraging per-layer Lipschitz bounds to set noise levels more precisely.

Abstract

Recently, due to the popularity of deep neural networks and other methods whose training typically relies on the optimization of an objective function, and due to concerns for data privacy, there is a lot of interest in differentially private gradient descent methods. To achieve differential privacy guarantees with a minimum amount of noise, it is important to be able to bound precisely the sensitivity of the information which the participants will observe. In this study, we present a novel approach that mitigates the bias arising from traditional gradient clipping. By leveraging a public upper bound of the Lipschitz value of the current model and its current location within the search domain, we can achieve refined noise level adjustments. We present a new algorithm with improved differential privacy guarantees and a systematic empirical evaluation, showing that our new approach outperforms existing approaches also in practice.

DP-SGD with weight clipping

TL;DR

This paper addresses private training with differential privacy by moving beyond gradient clipping to a Lipschitz-based sensitivity bound. It introduces Lip-DP-SGD, which enforces Lipschitz constraints via ClipWeights and per-layer sensitivity estimates, enabling noise to be scaled without clipping biases. The approach delivers state-of-the-art accuracy under DP on image and tabular datasets and provides an open-source Lip-DP-SGD toolkit built on PyTorch/Opacus. The work demonstrates that weight clipping and Lipschitz control can substantially improve the privacy-utility trade-off and offers practical guidance for private deep learning. Overall, Lip-DP-SGD advances private training by reducing bias and leveraging per-layer Lipschitz bounds to set noise levels more precisely.

Abstract

Recently, due to the popularity of deep neural networks and other methods whose training typically relies on the optimization of an objective function, and due to concerns for data privacy, there is a lot of interest in differentially private gradient descent methods. To achieve differential privacy guarantees with a minimum amount of noise, it is important to be able to bound precisely the sensitivity of the information which the participants will observe. In this study, we present a novel approach that mitigates the bias arising from traditional gradient clipping. By leveraging a public upper bound of the Lipschitz value of the current model and its current location within the search domain, we can achieve refined noise level adjustments. We present a new algorithm with improved differential privacy guarantees and a systematic empirical evaluation, showing that our new approach outperforms existing approaches also in practice.
Paper Structure (46 sections, 4 theorems, 40 equations, 4 figures, 5 tables, 2 algorithms)

This paper contains 46 sections, 4 theorems, 40 equations, 4 figures, 5 tables, 2 algorithms.

Key Result

lemma thmcounterlemma

Let $f:\mathcal{Z}\to\mathbb{R}^m$ be a function. The Gaussian mechanism transforms $f$ into ${\hat{f}}$ with ${\hat{f}}(Z) = f(Z) + b$ where $b\sim \mathcal{N}(0,\sigma^2 I_m)\in\mathbb{R}^m$ is Gaussian distributed noise. If the variance satisfies $\sigma^2 \ge 2\ln(1.25/\delta)(s_2(f))^2/\epsilon

Figures (4)

  • Figure 1: Accuracy results, with a fixed $\delta = 10^{-5}$, for the MNIST (\ref{['fig:mnist_perf']}), Fashion-MNIST (\ref{['fig:FashionMNIST_perf']}), and CIFAR-10 (\ref{['fig:cifar_perf']}) test datasets. The plots show the median accuracy over 5 runs, with vertical lines indicating the standard error of the mean. See Appendix \ref{['sec:app.exp.hyperparameters']} for details on model specifications and hyperparameters.
  • Figure 2: Median runtime in seconds per batch size on one epoch over the MNIST dataset \ref{['fig:mnist_run']} and the CIFAR-10 dataset \ref{['fig:cifar_run']} comparing DP-SGD (in orange) and Lip-DP-SGD (in blue).
  • Figure 3: Norm of the average error $g - \text{clip}(g)$ (in blue) and norm of the average of $\text{clip}(g)$ (in red) across training iterations on the Dropout dataset \ref{['fig:dropout_clip']} (averaged over 500 instances) and the Adult Income dataset \ref{['fig:income_clip']} (averaged over 500 instances).
  • Figure 4: An example of gradient clipping causing bias, here the average gradient becomes zero at $(0,0)$ while the average clipped gradient is $0$ at another point, causing convergence of DP-SGD to that point rather than the correct one.

Theorems & Definitions (11)

  • definition thmcounterdefinition: adjacent datasets
  • definition thmcounterdefinition: differential privacy dwork_algorithmic_2013
  • definition thmcounterdefinition: sensitivity
  • lemma thmcounterlemma: Gaussian mechanism
  • definition thmcounterdefinition
  • definition thmcounterdefinition: Lipschitz function
  • theorem thmcountertheorem
  • theorem thmcountertheorem
  • theorem thmcountertheorem
  • proof
  • ...and 1 more