Table of Contents
Fetching ...

Scalable Differentially Private Bayesian Optimization

Getoar Sopa, Juraj Marusic, Marco Avella-Medina, John P. Cunningham

TL;DR

This work introduces DP-GIBO, a scalable private optimization method for high-dimensional continuous hyperparameters that combines local Bayesian optimization with gradient information obtained from a Gaussian Process surrogate. By clipping gradient estimates, adaptively selecting evaluation points, and adding Gaussian privacy noise, the algorithm provides a formal Gaussian Differential Privacy guarantee while converging to a local minimum in the noiseless case or to a privacy-affected neighborhood when observations are noisy. Theoretical contributions include exponential convergence in the noiseless setting and dimension-dependent convergence bounds in the noisy case, with linear scaling in the dimension for the noiseless regime and polynomial scaling under privacy/noise. Empirically, DP-GIBO outperforms existing private hyperparameter tuning methods in high-dimensional spaces and remains competitive with non-private GIBO, demonstrating its potential for privacy-preserving tuning of large-scale models.

Abstract

In recent years, there has been much work on scaling Bayesian Optimization to high-dimensional problems, for example hyperparameter tuning in large machine learning models. These scalable methods have been successful, finding high objective values much more quickly than traditional global Bayesian Optimization or random search-based methods. At the same time, these large models often use sensitive data, but preservation of Differential Privacy has not scaled alongside these modern Bayesian Optimization procedures. Here we develop a method to privately optimize potentially high-dimensional parameter spaces using privatized Gradient Informative Bayesian Optimization. Our theoretical results show that under suitable conditions, our method converges exponentially fast to a locally optimal parameter configuration, up to a natural privacy error. Moreover, regardless of whether the assumptions are satisfied, we prove that our algorithm maintains privacy and empirically display superior performance to existing methods in the high-dimensional hyperparameter setting.

Scalable Differentially Private Bayesian Optimization

TL;DR

This work introduces DP-GIBO, a scalable private optimization method for high-dimensional continuous hyperparameters that combines local Bayesian optimization with gradient information obtained from a Gaussian Process surrogate. By clipping gradient estimates, adaptively selecting evaluation points, and adding Gaussian privacy noise, the algorithm provides a formal Gaussian Differential Privacy guarantee while converging to a local minimum in the noiseless case or to a privacy-affected neighborhood when observations are noisy. Theoretical contributions include exponential convergence in the noiseless setting and dimension-dependent convergence bounds in the noisy case, with linear scaling in the dimension for the noiseless regime and polynomial scaling under privacy/noise. Empirically, DP-GIBO outperforms existing private hyperparameter tuning methods in high-dimensional spaces and remains competitive with non-private GIBO, demonstrating its potential for privacy-preserving tuning of large-scale models.

Abstract

In recent years, there has been much work on scaling Bayesian Optimization to high-dimensional problems, for example hyperparameter tuning in large machine learning models. These scalable methods have been successful, finding high objective values much more quickly than traditional global Bayesian Optimization or random search-based methods. At the same time, these large models often use sensitive data, but preservation of Differential Privacy has not scaled alongside these modern Bayesian Optimization procedures. Here we develop a method to privately optimize potentially high-dimensional parameter spaces using privatized Gradient Informative Bayesian Optimization. Our theoretical results show that under suitable conditions, our method converges exponentially fast to a locally optimal parameter configuration, up to a natural privacy error. Moreover, regardless of whether the assumptions are satisfied, we prove that our algorithm maintains privacy and empirically display superior performance to existing methods in the high-dimensional hyperparameter setting.

Paper Structure

This paper contains 31 sections, 16 theorems, 102 equations, 5 figures, 1 algorithm.

Key Result

Theorem 2.4

Let $h$ be a deterministic function with finite global sensitivity $GS(h)$. The randomized function $\tilde{h}(x) = h(x) + \frac{GS(h)}{\mu}Z$, where $Z \sim N(0,I_d)$, is $\mu$-GDP.

Figures (5)

  • Figure 1: Local private approach does not suffer from the curse of dimensionality. The performance of Global Bayesian Optimization methods, as well as random grid search methods, depend greatly on the dimension of the problem. By privatizing a Local Bayesian Optimization approach, we significantly improve performance in higher dimensions while preserving privacy. Pictured are DP-GIBO, Random Search, and UCB-BO.
  • Figure 2: Results of experiments.Left. We present the results of Example \ref{['example:GP']}, where we test the performance of Algorithm \ref{['algorithm']} on tuning the GP regression lengthscales in $d = 15$ dimensions, and we vary the level of permitted bias $\varepsilon$, privacy level $\mu$ and noise level $\sigma$. Right. In Example \ref{['sec:svmtuning']}, we compare our privatized method to non-private GIBO (i.e. $\mu = \infty$ ) and to random search.
  • Figure 3: Top: Estimates of a single coordinate of the model; Middle: The optimization path of the negative-log likelihood; Bottom: Comparison between the norm of the bias of the gradient estimate and the norm of the added noise for $\mu = 0.5$ case.
  • Figure 4: Top: Estimates of a single coordinate of the regression parameter; Bottom: Gradient of the loss function at the current iteration, on a log scale.
  • Figure 5: Left: We used the correct variance of the noise; Middle: We overestimated the variance of the noise; Right: We underestimated the variance of the noise

Theorems & Definitions (30)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Theorem 2.4: Gaussian Mechanism
  • Theorem 2.5: GDP Composition
  • Theorem 3.1
  • Theorem 3.7
  • Lemma 3.8
  • Theorem 3.10
  • Lemma A.1
  • ...and 20 more