Table of Contents
Fetching ...

Nonparametric Bayesian Optimization for General Rewards

Zishi Zhang, Tao Ren, Yijie Peng

TL;DR

The paper tackles Bayesian optimization when the reward distribution is uncertain and potentially nonstandard. It introduces the infinity Gaussian process (∞-GP), a Bayesian nonparametric surrogate that places a prior over reward distributions via a sequential spatial Dirichlet process mixture, yielding an infinite mixture of GP surfaces. When combined with Thompson Sampling, the ∞-GP achieves no-regret guarantees under very mild conditions (Lipschitz objective and general noise tails), outperforming classical GP surrogates in heavy-tailed and nonstationary settings. Computationally, a truncated Gibbs sampler ensures scalability with only a logarithmic growth in realized surfaces, making the approach practical for large-scale BO tasks and real-world applications.

Abstract

This work focuses on Bayesian optimization (BO) under reward model uncertainty. We propose the first BO algorithm that achieves no-regret guarantee in a general reward setting, requiring only Lipschitz continuity of the objective function and accommodating a broad class of measurement noise. The core of our approach is a novel surrogate model, termed as infinite Gaussian process ($\infty$-GP). It is a Bayesian nonparametric model that places a prior on the space of reward distributions, enabling it to represent a substantially broader class of reward models than classical Gaussian process (GP). The $\infty$-GP is used in combination with Thompson Sampling (TS) to enable effective exploration and exploitation. Correspondingly, we develop a new TS regret analysis framework for general rewards, which relates the regret to the total variation distance between the surrogate model and the true reward distribution. Furthermore, with a truncated Gibbs sampling procedure, our method is computationally scalable, incurring minimal additional memory and computational complexities compared to classical GP. Empirical results demonstrate state-of-the-art performance, particularly in settings with non-stationary, heavy-tailed, or other ill-conditioned rewards.

Nonparametric Bayesian Optimization for General Rewards

TL;DR

The paper tackles Bayesian optimization when the reward distribution is uncertain and potentially nonstandard. It introduces the infinity Gaussian process (∞-GP), a Bayesian nonparametric surrogate that places a prior over reward distributions via a sequential spatial Dirichlet process mixture, yielding an infinite mixture of GP surfaces. When combined with Thompson Sampling, the ∞-GP achieves no-regret guarantees under very mild conditions (Lipschitz objective and general noise tails), outperforming classical GP surrogates in heavy-tailed and nonstationary settings. Computationally, a truncated Gibbs sampler ensures scalability with only a logarithmic growth in realized surfaces, making the approach practical for large-scale BO tasks and real-world applications.

Abstract

This work focuses on Bayesian optimization (BO) under reward model uncertainty. We propose the first BO algorithm that achieves no-regret guarantee in a general reward setting, requiring only Lipschitz continuity of the objective function and accommodating a broad class of measurement noise. The core of our approach is a novel surrogate model, termed as infinite Gaussian process (-GP). It is a Bayesian nonparametric model that places a prior on the space of reward distributions, enabling it to represent a substantially broader class of reward models than classical Gaussian process (GP). The -GP is used in combination with Thompson Sampling (TS) to enable effective exploration and exploitation. Correspondingly, we develop a new TS regret analysis framework for general rewards, which relates the regret to the total variation distance between the surrogate model and the true reward distribution. Furthermore, with a truncated Gibbs sampling procedure, our method is computationally scalable, incurring minimal additional memory and computational complexities compared to classical GP. Empirical results demonstrate state-of-the-art performance, particularly in settings with non-stationary, heavy-tailed, or other ill-conditioned rewards.
Paper Structure (43 sections, 11 theorems, 112 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 43 sections, 11 theorems, 112 equations, 6 figures, 3 tables, 2 algorithms.

Key Result

Proposition 1

The predictive reward distribution for any unexplored solution $x_{n+1}\in\mathcal{X}$ is given by The term A, $\textcolor{blue}{[y(x_{n+1}) \mid \Theta^{(1)}, \xi^{(z_{n+1})}(x_{n+1})]}$, follows $\mathcal{N}(x_{n+1}^\top \beta + \xi^{(z_{n+1})}(x_{n+1}), \tau^2)$.

Figures (6)

  • Figure 1: (a)–(b): Draw a stochastic process from the distribution space, representing model uncertainty. (b)–(c): Draw a realization from the stochastic process, representing value uncertainty. $\infty$-GP model includes (a)–(c), whereas the GP model includes only (b)–(c).
  • Figure 2: $\infty$-GP is theoretically an infinite sum of GP surfaces. The solid dots means that the $\xi(x_i)$ is realized on that surface and the hollow ones represent that an unrealized latent variable. Red dots means predicting the new response at an unexplored $x_{new}$. Red dots on solid lines represent new observations being realized on an existing surface, while dashed lines indicate the observation lies on a new surface drawn from the base distribution.
  • Figure 3: Illustration of the modeling-capacity limitation of GP surrogates. Even with infinite data, the best Gaussian approximation cannot match a $t$-distribution or a bimodal distribution, leaving a persistent TVD gap (blue shaded region).
  • Figure 4: Cumulative regret comparison across BO algorithms
  • Figure 5: Weights of each surfaces over optimization of Ackley-NS.
  • ...and 1 more figures

Theorems & Definitions (17)

  • Definition 1: RKHS Assumption
  • Proposition 1: Predictive Distribution
  • Theorem 1: Convergence of the $\infty$-GP Model
  • Remark 1: Mildness of the Assumption \ref{['ass_guangzhi']}
  • Lemma 1: Regret Bound via TVD
  • Remark 2
  • Lemma 2: Regret bound via Sampling Decoupling
  • Theorem 2
  • Remark 3: Mildness of Assumption
  • Proposition 2: Slow Growth in the Number of Realized Surfaces
  • ...and 7 more