Differentially Private Learning with Adaptive Clipping
Galen Andrew, Om Thakkar, H. Brendan McMahan, Swaroop Ramaswamy
TL;DR
The paper tackles the difficulty of selecting clipping norms under user-level differential privacy in federated learning. It introduces a privately trackable adaptive clipping mechanism that targets a chosen update-norm quantile, estimated online via a convex loss and a fast geometric update, and integrates this with DP-FedAvg. Empirical results across multiple realistic FL tasks show that adaptive clipping often matches or outperforms the best fixed clipping without hyperparameter tuning, while maintaining strong privacy guarantees with negligible budget. The work offers practical guidance for deploying DP-FedAvg, reducing the need for extensive clipping-tuning and enabling private, scalable FL with common techniques like compression and secure aggregation.
Abstract
Existing approaches for training neural networks with user-level differential privacy (e.g., DP Federated Averaging) in federated learning (FL) settings involve bounding the contribution of each user's model update by clipping it to some constant value. However there is no good a priori setting of the clipping norm across tasks and learning settings: the update norm distribution depends on the model architecture and loss, the amount of data on each device, the client learning rate, and possibly various other parameters. We propose a method wherein instead of a fixed clipping norm, one clips to a value at a specified quantile of the update norm distribution, where the value at the quantile is itself estimated online, with differential privacy. The method tracks the quantile closely, uses a negligible amount of privacy budget, is compatible with other federated learning technologies such as compression and secure aggregation, and has a straightforward joint DP analysis with DP-FedAvg. Experiments demonstrate that adaptive clipping to the median update norm works well across a range of realistic federated learning tasks, sometimes outperforming even the best fixed clip chosen in hindsight, and without the need to tune any clipping hyperparameter.
