DP-SGD Without Clipping: The Lipschitz Neural Network Way

Louis Bethune; Thomas Massena; Thibaut Boissin; Yannick Prudent; Corentin Friedrich; Franck Mamalet; Aurelien Bellet; Mathieu Serrurier; David Vigouroux

DP-SGD Without Clipping: The Lipschitz Neural Network Way

Louis Bethune, Thomas Massena, Thibaut Boissin, Yannick Prudent, Corentin Friedrich, Franck Mamalet, Aurelien Bellet, Mathieu Serrurier, David Vigouroux

TL;DR

By bounding the Lipschitz constant of each layer with respect to its parameters, it is proved that these networks can be trained with privacy guarantees.

Abstract

State-of-the-art approaches for training Differentially Private (DP) Deep Neural Networks (DNN) face difficulties to estimate tight bounds on the sensitivity of the network's layers, and instead rely on a process of per-sample gradient clipping. This clipping process not only biases the direction of gradients but also proves costly both in memory consumption and in computation. To provide sensitivity bounds and bypass the drawbacks of the clipping process, we propose to rely on Lipschitz constrained networks. Our theoretical analysis reveals an unexplored link between the Lipschitz constant with respect to their input and the one with respect to their parameters. By bounding the Lipschitz constant of each layer with respect to its parameters, we prove that we can train these networks with privacy guarantees. Our analysis not only allows the computation of the aforementioned sensitivities at scale, but also provides guidance on how to maximize the gradient-to-noise ratio for fixed privacy guarantees. The code has been released as a Python package available at https://github.com/Algue-Rythme/lip-dp

DP-SGD Without Clipping: The Lipschitz Neural Network Way

TL;DR

By bounding the Lipschitz constant of each layer with respect to its parameters, it is proved that these networks can be trained with privacy guarantees.

Abstract

Paper Structure (70 sections, 11 theorems, 83 equations, 19 figures, 8 tables, 4 algorithms)

This paper contains 70 sections, 11 theorems, 83 equations, 19 figures, 8 tables, 4 algorithms.

Introduction
Clipless DP-SGD with $\ell$-Lipschitz networks
Backpropagation for bounds
Backpropagate cotangent vector bounds (line \ref{['alg:gnbc:lip']}).
Signal-to-noise ratio analysis
Theoretical analysis of Clipless DP-SGD
Controlling $K$ with Gradient Norm Preserving (GNP) networks.
Lip-dp library
Experimental results
Evaluation of privacy, accuracy and robustness
Speed and memory consumption
Limitations, future work and broader impact
Definitions and methods
Additionnal background
Lipschitz neural networks background
...and 55 more sections

Key Result

Proposition 1

Assume that the loss fulfills $\|\nabla_{\theta}\mathcal{L}(\hat{y},y)\|_2\leq l$, and assume that the network is trained on a dataset of size $N$ with SGD algorithm for $T$ steps with noise scale $\mathcal{N}(\bf 0, \sigma^2)$ such that: Then the SGD training of the network is $(\epsilon,\delta)$-DP.

Figures (19)

Figure 1: An example of usage of our framework , illustrating how to create a small Lipschitz VGG and how to train it under $(\epsilon,\delta)$-DP guarantees while reporting $(\epsilon, \delta)$ values.
Figure 2: Backpropagation for bounds (Algorithm \ref{['alg:gnbc']}) computes the per-layer sensitivity $\Delta_d$.
Figure 3: Comparison of the utility of DP-SGD and Clipless DP-SGD on tabular and image data.
Figure 4: Privacy/accuracy/robustness trade-off on Cifar-10: We report the pareto front of robustness certificates at different radii $r$ for Lipschitz constrained networks, while unconstrained networks cannot produce robustness certificates. Models are trained in an "out of the box setting": no pre-training, no data augmentation and no handcrafted features. To ensure a fair comparison between algorithms, we perform 30 repetitions with a Bayesian optimizer to select the best hyper-parameters.
Figure 5: Our approach outperforms concurrent frameworks in terms of runtime and memory: we trained CNNs (ranging from 130K to 2M parameters) on CIFAR-10, and report the median batch processing time (including noise, and constraints application $\Pi$ or gradient clipping).
...and 14 more figures

Theorems & Definitions (32)

Definition 1: $(\epsilon,\delta)$-Approximate Differential Privacy
Definition 2: $l_2$-sensitivity
Definition 3: Lipschitz feed-forward neural network
Remark 1: Tighter bounds in literature
Remark 2: GNP networks limitations
Definition 4: Feedforward neural network
Definition 5: Lipschitz constant
Definition 6: Lipschitz neural network
Definition 7: Gradient Norm Preserving Networks
Definition 8: Neighboring datasets
...and 22 more

DP-SGD Without Clipping: The Lipschitz Neural Network Way

TL;DR

Abstract

DP-SGD Without Clipping: The Lipschitz Neural Network Way

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (32)