Table of Contents
Fetching ...

Provably Efficient Bayesian Optimization with Unknown Gaussian Process Hyperparameter Estimation

Huong Ha, Vu Nguyen, Hung Tran-The, Hongyu Zhang, Xiuzhen Zhang, Anton van den Hengel

TL;DR

The paper tackles Bayesian optimization when GP hyperparameters are unknown and must be learned from biased, non-i.i.d. data. It introduces Unknown Hyperparameter Estimation for Bayesian Optimization (UHE), combining an EXP3-based iid-sampling mechanism with a consistent GP hyperparameter loss to ensure reliable estimation. Theoretical results establish high-probability sub-linear regret and convergence of the hyperparameters to the true value, while empirical evaluations on synthetic and real-world tasks show UHE outperforms existing approaches. This work enhances BO robustness in practical settings where hyperparameters are not known a priori and data collection is guided by acquisition strategies.

Abstract

Gaussian process (GP) based Bayesian optimization (BO) is a powerful method for optimizing black-box functions efficiently. The practical performance and theoretical guarantees of this approach depend on having the correct GP hyperparameter values, which are usually unknown in advance and need to be estimated from the observed data. However, in practice, these estimations could be incorrect due to biased data sampling strategies used in BO. This can lead to degraded performance and break the sub-linear global convergence guarantee of BO. To address this issue, we propose a new BO method that can sub-linearly converge to the objective function's global optimum even when the true GP hyperparameters are unknown in advance and need to be estimated from the observed data. Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process, and employs a novel training loss function for the GP hyperparameter estimation process that ensures consistent estimation. We further provide theoretical analysis of our proposed method. Finally, we demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.

Provably Efficient Bayesian Optimization with Unknown Gaussian Process Hyperparameter Estimation

TL;DR

The paper tackles Bayesian optimization when GP hyperparameters are unknown and must be learned from biased, non-i.i.d. data. It introduces Unknown Hyperparameter Estimation for Bayesian Optimization (UHE), combining an EXP3-based iid-sampling mechanism with a consistent GP hyperparameter loss to ensure reliable estimation. Theoretical results establish high-probability sub-linear regret and convergence of the hyperparameters to the true value, while empirical evaluations on synthetic and real-world tasks show UHE outperforms existing approaches. This work enhances BO robustness in practical settings where hyperparameters are not known a priori and data collection is guided by acquisition strategies.

Abstract

Gaussian process (GP) based Bayesian optimization (BO) is a powerful method for optimizing black-box functions efficiently. The practical performance and theoretical guarantees of this approach depend on having the correct GP hyperparameter values, which are usually unknown in advance and need to be estimated from the observed data. However, in practice, these estimations could be incorrect due to biased data sampling strategies used in BO. This can lead to degraded performance and break the sub-linear global convergence guarantee of BO. To address this issue, we propose a new BO method that can sub-linearly converge to the objective function's global optimum even when the true GP hyperparameters are unknown in advance and need to be estimated from the observed data. Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process, and employs a novel training loss function for the GP hyperparameter estimation process that ensures consistent estimation. We further provide theoretical analysis of our proposed method. Finally, we demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.
Paper Structure (40 sections, 5 theorems, 27 equations, 5 figures, 2 algorithms)

This paper contains 40 sections, 5 theorems, 27 equations, 5 figures, 2 algorithms.

Key Result

Proposition 6.4

Suppose Assumptions assum:l-loss & assum:f-smoothness are satisfied, with $M_t \geq \vert D_t \vert$, then RDEXP3 achieves,

Figures (5)

  • Figure 1: Left: Data selected by a BO process. Middle: Estimated posterior distribution of a GP hyperparameter based on observed data from a BO process. Right: Estimated posterior distribution of the GP hyperparameter based on i.i.d. data (with the same number of data points as in the middle figure). The Middle illustrates that the observed data from a BO process are generally not i.i.d., thus, the estimated GP hyperparameter posterior distribution might not represent the true posterior. The Right shows that even when the GP hyperparameter posterior distribution is estimated from i.i.d. data, the MAP estimate may not be the true GP hyperparameter as the observed data is finite.
  • Figure 2: Results on synthetic (Top) and real-world benchmarks (Bottom). Lines and shaded areas denote mean $\pm$ 1 standard error. Experiments are repeated 20 times.
  • Figure 3: Bottom: Observed data (black crosses) and the GP mean by MAP. The GP hyperparameter estimation by MAP is incorrect due to a biased data collection process, resulting in the GP mean being different from the true function (in Left). Top: with our proposed method, consistent GP hyperparameter estimation can be obtained, and thus, the GP mean is more accurate.
  • Figure 4: MSE results on synthetic benchmarks. Lines and shaded areas are mean $\pm$ 1 standard error.
  • Figure 5: Left & Middle: The performance of our method (UHE), the method RDEXP3, and the method with i.i.d. sampling via EXP3 (Random+EXP3) on two problems. Right: The running time of our method is comparable to MAP whilst much faster than MCMC across three problems.

Theorems & Definitions (12)

  • Proposition 6.4
  • Theorem 6.5
  • Theorem 6.6
  • Proposition 6.7
  • Theorem 6.8
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 2 more