Table of Contents
Fetching ...

Differentially Private Learning with Adaptive Clipping

Galen Andrew, Om Thakkar, H. Brendan McMahan, Swaroop Ramaswamy

TL;DR

The paper tackles the difficulty of selecting clipping norms under user-level differential privacy in federated learning. It introduces a privately trackable adaptive clipping mechanism that targets a chosen update-norm quantile, estimated online via a convex loss and a fast geometric update, and integrates this with DP-FedAvg. Empirical results across multiple realistic FL tasks show that adaptive clipping often matches or outperforms the best fixed clipping without hyperparameter tuning, while maintaining strong privacy guarantees with negligible budget. The work offers practical guidance for deploying DP-FedAvg, reducing the need for extensive clipping-tuning and enabling private, scalable FL with common techniques like compression and secure aggregation.

Abstract

Existing approaches for training neural networks with user-level differential privacy (e.g., DP Federated Averaging) in federated learning (FL) settings involve bounding the contribution of each user's model update by clipping it to some constant value. However there is no good a priori setting of the clipping norm across tasks and learning settings: the update norm distribution depends on the model architecture and loss, the amount of data on each device, the client learning rate, and possibly various other parameters. We propose a method wherein instead of a fixed clipping norm, one clips to a value at a specified quantile of the update norm distribution, where the value at the quantile is itself estimated online, with differential privacy. The method tracks the quantile closely, uses a negligible amount of privacy budget, is compatible with other federated learning technologies such as compression and secure aggregation, and has a straightforward joint DP analysis with DP-FedAvg. Experiments demonstrate that adaptive clipping to the median update norm works well across a range of realistic federated learning tasks, sometimes outperforming even the best fixed clip chosen in hindsight, and without the need to tune any clipping hyperparameter.

Differentially Private Learning with Adaptive Clipping

TL;DR

The paper tackles the difficulty of selecting clipping norms under user-level differential privacy in federated learning. It introduces a privately trackable adaptive clipping mechanism that targets a chosen update-norm quantile, estimated online via a convex loss and a fast geometric update, and integrates this with DP-FedAvg. Empirical results across multiple realistic FL tasks show that adaptive clipping often matches or outperforms the best fixed clipping without hyperparameter tuning, while maintaining strong privacy guarantees with negligible budget. The work offers practical guidance for deploying DP-FedAvg, reducing the need for extensive clipping-tuning and enabling private, scalable FL with common techniques like compression and secure aggregation.

Abstract

Existing approaches for training neural networks with user-level differential privacy (e.g., DP Federated Averaging) in federated learning (FL) settings involve bounding the contribution of each user's model update by clipping it to some constant value. However there is no good a priori setting of the clipping norm across tasks and learning settings: the update norm distribution depends on the model architecture and loss, the amount of data on each device, the client learning rate, and possibly various other parameters. We propose a method wherein instead of a fixed clipping norm, one clips to a value at a specified quantile of the update norm distribution, where the value at the quantile is itself estimated online, with differential privacy. The method tracks the quantile closely, uses a negligible amount of privacy budget, is compatible with other federated learning technologies such as compression and secure aggregation, and has a straightforward joint DP analysis with DP-FedAvg. Experiments demonstrate that adaptive clipping to the median update norm works well across a range of realistic federated learning tasks, sometimes outperforming even the best fixed clip chosen in hindsight, and without the need to tune any clipping hyperparameter.

Paper Structure

This paper contains 11 sections, 1 theorem, 2 equations, 7 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

One step of DP-FedAvg with adaptive clipping using $\sigma_b$ noise standard deviation on the clipped counts $\sum b_i^t$ and $z_\Delta$ noise multiplier on the vector sums $\sum \Delta_i^t$ is equivalent (so far as privacy accounting is concerned) to one step of non-adaptive DP-FedAvg with noise mu

Figures (7)

  • Figure 1:
  • Figure 2: Evolution of the quantile estimate on data drawn from log-normal distributions. The three plots use data drawn from the exponential of $\mathcal{N}(0.0, 1.0)$, $\mathcal{N}(0.0, 0.1)$, and $\mathcal{N}(\log 10, 1.0)$, respectively. Curves are shown for each of five quantiles: (0.1, 0.3, 0.5, 0.7, 0.9), and the dashed lines show the true value at each quantile. Hyperparameters are as discussed in the text and used in the experiments of Section \ref{['sec:experiments']}: $\eta_C = 0.2, C^0 = 0.1, m=100, \sigma_b=m/20$. After an initial phase of exponential growth, the true quantile is fairly closely tracked. A smaller value of $\eta_C$ would allow more accurate tracking at the cost of slower convergence, but since the quantile value is only used as a heuristic for clipping, a small amount of noise is tolerable. The entire sequence of values estimated for each target quantile satisfy $(0.034, n^{-1.1})$-differential privacy using RDP composition across the 200 rounds assuming fixed-size samples of $m=100$ out of a total population of $n=10^6$wang2019subsampled.
  • Figure 3: Impact of clipping without noise. Performance of the unclipped baseline compared to five settings of $\gamma$, from $\gamma = 0.1$ (aggressive clipping) to $\gamma=0.9$ (mild clipping). The values shown are the evaluation metrics on the validation set averaged over the last 100 rounds. Note that the $y$-axes have been compressed to show small differences, and that for EMNIST-AE lower values are better.
  • Figure 4: Evolution of the adaptive clipping norm at five different quantiles (0.1, 0.3, 0.5, 0.7, 0.9) on each task with no noise. The norms are estimated using geometric updates with $\eta_C = 0.2$ and an initial value $C^0 = 0.1$. With the possible exception of SO-LR, the estimated quantiles appear to closely track an evolving update norm distribution. Note that each task has a unique shape to its update norm evolution, which further motivates an adaptive approach.
  • Figure 5: Evaluation metric performance of adaptive clipping with five settings of $\gamma$ for each of four effective noise multipliers $z$. Note that the $y$-axes have been compressed to show small differences, and that for EMNIST-AE lower values are better.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1.1: Differential Privacy
  • Theorem 1
  • proof