Nonparametric Bayesian Optimization for General Rewards
Zishi Zhang, Tao Ren, Yijie Peng
TL;DR
The paper tackles Bayesian optimization when the reward distribution is uncertain and potentially nonstandard. It introduces the infinity Gaussian process (∞-GP), a Bayesian nonparametric surrogate that places a prior over reward distributions via a sequential spatial Dirichlet process mixture, yielding an infinite mixture of GP surfaces. When combined with Thompson Sampling, the ∞-GP achieves no-regret guarantees under very mild conditions (Lipschitz objective and general noise tails), outperforming classical GP surrogates in heavy-tailed and nonstationary settings. Computationally, a truncated Gibbs sampler ensures scalability with only a logarithmic growth in realized surfaces, making the approach practical for large-scale BO tasks and real-world applications.
Abstract
This work focuses on Bayesian optimization (BO) under reward model uncertainty. We propose the first BO algorithm that achieves no-regret guarantee in a general reward setting, requiring only Lipschitz continuity of the objective function and accommodating a broad class of measurement noise. The core of our approach is a novel surrogate model, termed as infinite Gaussian process ($\infty$-GP). It is a Bayesian nonparametric model that places a prior on the space of reward distributions, enabling it to represent a substantially broader class of reward models than classical Gaussian process (GP). The $\infty$-GP is used in combination with Thompson Sampling (TS) to enable effective exploration and exploitation. Correspondingly, we develop a new TS regret analysis framework for general rewards, which relates the regret to the total variation distance between the surrogate model and the true reward distribution. Furthermore, with a truncated Gibbs sampling procedure, our method is computationally scalable, incurring minimal additional memory and computational complexities compared to classical GP. Empirical results demonstrate state-of-the-art performance, particularly in settings with non-stationary, heavy-tailed, or other ill-conditioned rewards.
