DP-SGD with weight clipping

Antoine Barczewski; Jan Ramon

DP-SGD with weight clipping

Antoine Barczewski, Jan Ramon

TL;DR

This paper addresses private training with differential privacy by moving beyond gradient clipping to a Lipschitz-based sensitivity bound. It introduces Lip-DP-SGD, which enforces Lipschitz constraints via ClipWeights and per-layer sensitivity estimates, enabling noise to be scaled without clipping biases. The approach delivers state-of-the-art accuracy under DP on image and tabular datasets and provides an open-source Lip-DP-SGD toolkit built on PyTorch/Opacus. The work demonstrates that weight clipping and Lipschitz control can substantially improve the privacy-utility trade-off and offers practical guidance for private deep learning. Overall, Lip-DP-SGD advances private training by reducing bias and leveraging per-layer Lipschitz bounds to set noise levels more precisely.

Abstract

Recently, due to the popularity of deep neural networks and other methods whose training typically relies on the optimization of an objective function, and due to concerns for data privacy, there is a lot of interest in differentially private gradient descent methods. To achieve differential privacy guarantees with a minimum amount of noise, it is important to be able to bound precisely the sensitivity of the information which the participants will observe. In this study, we present a novel approach that mitigates the bias arising from traditional gradient clipping. By leveraging a public upper bound of the Lipschitz value of the current model and its current location within the search domain, we can achieve refined noise level adjustments. We present a new algorithm with improved differential privacy guarantees and a systematic empirical evaluation, showing that our new approach outperforms existing approaches also in practice.

DP-SGD with weight clipping

TL;DR

Abstract

Paper Structure (46 sections, 4 theorems, 40 equations, 4 figures, 5 tables, 2 algorithms)

This paper contains 46 sections, 4 theorems, 40 equations, 4 figures, 5 tables, 2 algorithms.

Introduction
Preliminaries and background
Differential Privacy
Empirical risk minimization
Stochastic gradient descent
Regularization
Our approach
Estimating lipschitz values
Loss function and activation layer.
Normalization layer.
Linear layers.
Convolutional layers.
Residual connections
Backpropagation
Lip-DP-SGD
...and 31 more sections

Key Result

lemma thmcounterlemma

Let $f:\mathcal{Z}\to\mathbb{R}^m$ be a function. The Gaussian mechanism transforms $f$ into ${\hat{f}}$ with ${\hat{f}}(Z) = f(Z) + b$ where $b\sim \mathcal{N}(0,\sigma^2 I_m)\in\mathbb{R}^m$ is Gaussian distributed noise. If the variance satisfies $\sigma^2 \ge 2\ln(1.25/\delta)(s_2(f))^2/\epsilon

Figures (4)

Figure 1: Accuracy results, with a fixed $\delta = 10^{-5}$, for the MNIST (\ref{['fig:mnist_perf']}), Fashion-MNIST (\ref{['fig:FashionMNIST_perf']}), and CIFAR-10 (\ref{['fig:cifar_perf']}) test datasets. The plots show the median accuracy over 5 runs, with vertical lines indicating the standard error of the mean. See Appendix \ref{['sec:app.exp.hyperparameters']} for details on model specifications and hyperparameters.
Figure 2: Median runtime in seconds per batch size on one epoch over the MNIST dataset \ref{['fig:mnist_run']} and the CIFAR-10 dataset \ref{['fig:cifar_run']} comparing DP-SGD (in orange) and Lip-DP-SGD (in blue).
Figure 3: Norm of the average error $g - \text{clip}(g)$ (in blue) and norm of the average of $\text{clip}(g)$ (in red) across training iterations on the Dropout dataset \ref{['fig:dropout_clip']} (averaged over 500 instances) and the Adult Income dataset \ref{['fig:income_clip']} (averaged over 500 instances).
Figure 4: An example of gradient clipping causing bias, here the average gradient becomes zero at $(0,0)$ while the average clipped gradient is $0$ at another point, causing convergence of DP-SGD to that point rather than the correct one.

Theorems & Definitions (11)

definition thmcounterdefinition: adjacent datasets
definition thmcounterdefinition: differential privacy dwork_algorithmic_2013
definition thmcounterdefinition: sensitivity
lemma thmcounterlemma: Gaussian mechanism
definition thmcounterdefinition
definition thmcounterdefinition: Lipschitz function
theorem thmcountertheorem
theorem thmcountertheorem
theorem thmcountertheorem
proof
...and 1 more

DP-SGD with weight clipping

TL;DR

Abstract

DP-SGD with weight clipping

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)