Online Sensitivity Optimization in Differentially Private Learning
Filippo Galli, Catuscia Palamidessi, Tommaso Cucinotta
TL;DR
The paper tackles hyperparameter tuning in differentially private learning by focusing on the gradient clipping threshold $C$ in DP-SGD. It introduces OSO-DPSGD, which treats $C_t$ as a learnable parameter and updates it via gradient-informed exponential rules while preserving DP through Gaussian mechanisms. The approach provides a privacy-efficient alternative to grid search and demonstrates competitive performance across MNIST, FashionMNIST, and AG News against fixed-threshold and fixed-quantile strategies. Key contributions include deriving private updates for $C_t$, decoupling sensitivity from DP budgets, and validating the method across varying model sizes and privacy levels.
Abstract
Training differentially private machine learning models requires constraining an individual's contribution to the optimization process. This is achieved by clipping the $2$-norm of their gradient at a predetermined threshold prior to averaging and batch sanitization. This selection adversely influences optimization in two opposing ways: it either exacerbates the bias due to excessive clipping at lower values, or augments sanitization noise at higher values. The choice significantly hinges on factors such as the dataset, model architecture, and even varies within the same optimization, demanding meticulous tuning usually accomplished through a grid search. In order to circumvent the privacy expenses incurred in hyperparameter tuning, we present a novel approach to dynamically optimize the clipping threshold. We treat this threshold as an additional learnable parameter, establishing a clean relationship between the threshold and the cost function. This allows us to optimize the former with gradient descent, with minimal repercussions on the overall privacy analysis. Our method is thoroughly assessed against alternative fixed and adaptive strategies across diverse datasets, tasks, model dimensions, and privacy levels. Our results indicate that it performs comparably or better in the evaluated scenarios, given the same privacy requirements.
