A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
Ashwinee Panda, Xinyu Tang, Saeed Mahloujifar, Vikash Sehwag, Prateek Mittal
TL;DR
The paper tackles the open problem of hyperparameter optimization under differential privacy for DP-SGD by introducing a private adaptive HPO method built around a new linear scaling rule. By first estimating optimal hyperparameters at cheap privacy budgets and then linearly scaling them to higher budgets, the approach dramatically reduces the privacy cost and computational burden of HPO while maintaining or improving utility. The authors provide a theoretical analysis of private gradient descent, reduce the HPO search to a one-dimensional radius r = η × T, and implement a private HPO procedure that privately extrapolates r(ε) and decomposes it into practical hyperparameters. Empirically, the method achieves state-of-the-art or competitive results across 22 CV/NLP tasks, including language modeling, with rigorous privacy accounting and demonstrated robustness to distribution shifts. This work meaningfully advances practical private training by enabling efficient, privacy-aware hyperparameter tuning that scales across tasks and privacy levels.
Abstract
An open problem in differentially private deep learning is hyperparameter optimization (HPO). DP-SGD introduces new hyperparameters and complicates existing ones, forcing researchers to painstakingly tune hyperparameters with hundreds of trials, which in turn makes it impossible to account for the privacy cost of HPO without destroying the utility. We propose an adaptive HPO method that uses cheap trials (in terms of privacy cost and runtime) to estimate optimal hyperparameters and scales them up. We obtain state-of-the-art performance on 22 benchmark tasks, across computer vision and natural language processing, across pretraining and finetuning, across architectures and a wide range of $\varepsilon \in [0.01,8.0]$, all while accounting for the privacy cost of HPO.
