Table of Contents
Fetching ...

Towards hyperparameter-free optimization with differential privacy

Zhiqi Bu, Ruixuan Liu

TL;DR

This work tackles the challenge of hyperparameter tuning in differential privacy by introducing HyFreeDP, a hyperparameter-free DP training framework that privately and automatically updates the learning rate. It combines a privatized GeN-based learning-rate estimator with loss privatization to minimize clipping bias and employs end-to-end privacy accounting to determine noise levels, achieving DP guarantees with minimal overhead. The approach is validated across vision and language tasks, showing DP performance close to non-DP grid searches and superior stability relative to DP-tuning baselines. The practical impact is a scalable, end-to-end DP training method that reduces tuning effort and privacy risk while preserving strong performance and efficiency.

Abstract

Differential privacy (DP) is a privacy-preserving paradigm that protects the training data when training deep learning models. Critically, the performance of models is determined by the training hyperparameters, especially those of the learning rate schedule, thus requiring fine-grained hyperparameter tuning on the data. In practice, it is common to tune the learning rate hyperparameters through the grid search that (1) is computationally expensive as multiple runs are needed, and (2) increases the risk of data leakage as the selection of hyperparameters is data-dependent. In this work, we adapt the automatic learning rate schedule to DP optimization for any models and optimizers, so as to significantly mitigate or even eliminate the cost of hyperparameter tuning when applied together with automatic per-sample gradient clipping. Our hyperparameter-free DP optimization is almost as computationally efficient as the standard non-DP optimization, and achieves state-of-the-art DP performance on various language and vision tasks.

Towards hyperparameter-free optimization with differential privacy

TL;DR

This work tackles the challenge of hyperparameter tuning in differential privacy by introducing HyFreeDP, a hyperparameter-free DP training framework that privately and automatically updates the learning rate. It combines a privatized GeN-based learning-rate estimator with loss privatization to minimize clipping bias and employs end-to-end privacy accounting to determine noise levels, achieving DP guarantees with minimal overhead. The approach is validated across vision and language tasks, showing DP performance close to non-DP grid searches and superior stability relative to DP-tuning baselines. The practical impact is a scalable, end-to-end DP training method that reduces tuning effort and privacy risk while preserving strong performance and efficiency.

Abstract

Differential privacy (DP) is a privacy-preserving paradigm that protects the training data when training deep learning models. Critically, the performance of models is determined by the training hyperparameters, especially those of the learning rate schedule, thus requiring fine-grained hyperparameter tuning on the data. In practice, it is common to tune the learning rate hyperparameters through the grid search that (1) is computationally expensive as multiple runs are needed, and (2) increases the risk of data leakage as the selection of hyperparameters is data-dependent. In this work, we adapt the automatic learning rate schedule to DP optimization for any models and optimizers, so as to significantly mitigate or even eliminate the cost of hyperparameter tuning when applied together with automatic per-sample gradient clipping. Our hyperparameter-free DP optimization is almost as computationally efficient as the standard non-DP optimization, and achieves state-of-the-art DP performance on various language and vision tasks.

Paper Structure

This paper contains 22 sections, 3 theorems, 21 equations, 8 figures, 8 tables, 2 algorithms.

Key Result

Theorem 1

The per-sample clipping bias of eq:loss priv is which is monotonically decreasing in $R_l$, and converges to $[\mathbb{E}(L_i|L_i>R_l)-R_l]\cdot\mathbb{P}(L_i>R_l)$ as $B\to \infty$. In contrast, the noise variance is $\textup{Var}(\tilde{L})=(\sigma_l R_l/B)^2$ which is increasing in $R_l$.

Figures (8)

  • Figure 1: HyFreeDP overview with three types of hyper-parameters in the DP training. HyFreeDP saves tuning efforts via automatically tuning hyper-parameters in red text, and sets other parameters as default constants. We showcase with 5 points in curve fitting.
  • Figure 2: Impact of loss value clipping and perturbation on curve fitting along different training iterations on CIFAR100 with Vit-Small fully fine-tuning, with zero in x-axis denotes the current $\bm{w}_t$. We use 5 points for the ease of illustration and use 3 points in \ref{['alg:ours']} and experiments.
  • Figure 3: Gradient and loss noise.
  • Figure 4: Automatic learning of clipping threshold, learning rate, training loss, and testing accuracy for SVHN (top) and GTSRB (bottom). HyFreeDP schedules $R_l$ and $\eta$ during training, approaching the manually tuned baseline with end-to-end DP guarantees, and is robust to varying intervals $K$.
  • Figure 5: Training dynamics of Llama2-7B on PubMed
  • ...and 3 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Corollary 1
  • Theorem 2
  • proof : Proof of \ref{['thm:R']}
  • proof : Proof of \ref{['cor:loss']}
  • proof : Proof of \ref{['thm:DP']}