Improving sample efficiency of high dimensional Bayesian optimization with MCMC

Zeji Yi; Yunyue Wei; Chu Xin Cheng; Kaibo He; Yanan Sui

Improving sample efficiency of high dimensional Bayesian optimization with MCMC

Zeji Yi, Yunyue Wei, Chu Xin Cheng, Kaibo He, Yanan Sui

TL;DR

This paper addresses the inefficiency of Bayesian optimization in high dimensions by introducing MCMC-BO, a framework that uses Metropolis-Hastings or Langevin dynamics to sample from an approximated GP-TS posterior and concentrate search in promising regions. By tracking only a batch of $m$ points per round, the method decouples computational complexity from the discretization size and provides a regret bound that scales with the information gain without requiring large stored grids. Theoretical results establish stationary convergence properties for the approximate posterior and a regret bound that scales with dimension and kernel information gain, while experiments on high-dimensional synthetic functions and Mujoco tasks show superior performance over state-of-the-art high-dimensional BO baselines. The approach offers a flexible, memory-efficient, and theoretically grounded path to boost sample efficiency in high-dimensional Bayesian optimization with potential for parallelization and integration with existing BO techniques.

Abstract

Sequential optimization methods are often confronted with the curse of dimensionality in high-dimensional spaces. Current approaches under the Gaussian process framework are still burdened by the computational complexity of tracking Gaussian process posteriors and need to partition the optimization problem into small regions to ensure exploration or assume an underlying low-dimensional structure. With the idea of transiting the candidate points towards more promising positions, we propose a new method based on Markov Chain Monte Carlo to efficiently sample from an approximated posterior. We provide theoretical guarantees of its convergence in the Gaussian process Thompson sampling setting. We also show experimentally that both the Metropolis-Hastings and the Langevin Dynamics version of our algorithm outperform state-of-the-art methods in high-dimensional sequential optimization and reinforcement learning benchmarks.

Improving sample efficiency of high dimensional Bayesian optimization with MCMC

TL;DR

points per round, the method decouples computational complexity from the discretization size and provides a regret bound that scales with the information gain without requiring large stored grids. Theoretical results establish stationary convergence properties for the approximate posterior and a regret bound that scales with dimension and kernel information gain, while experiments on high-dimensional synthetic functions and Mujoco tasks show superior performance over state-of-the-art high-dimensional BO baselines. The approach offers a flexible, memory-efficient, and theoretically grounded path to boost sample efficiency in high-dimensional Bayesian optimization with potential for parallelization and integration with existing BO techniques.

Abstract

Paper Structure (15 sections, 3 theorems, 4 equations, 8 figures)

This paper contains 15 sections, 3 theorems, 4 equations, 8 figures.

Introduction
Related Work
Background
Modeling with Gaussian processes
Gaussian process and Thompson sampling
Markov Chain Monte Carlo
Algorithm Design
Convergence Guarantee
Notation and assumption.
Overview of Proof
Experiments
High-Dimensional Synthetic Functions
Mujoco Locomotion Task
Performance on low-dimensional problems
Conclusion and Future Work

Key Result

lemma 1

The proposed approximated posterior $P(x,\cdot)$, according to Alg.alg: MCMC-MH and Equation.eq: acceptance prob. does not yield a reversible Markov chain, but it still has a stationary distribution $\pi(x)$.

Figures (8)

Figure 1: Illustration of MCMC-BO. The contours are 2d Rastrigin function. (Left): BO algorithms propose points to be sampled. The optimization performance is restricted by insufficient discretization. (Right): Points are adjusted by MCMC-BO , reaching regions with higher value.
Figure 2: MCMC-BO
Figure 3: [MCMC routine] with Metropolis-Hastings
Figure 4: [MCMC routine] with Langevin dynamics
Figure 5: The figures are constructed from a $50 \times 50$ discretization of $D = [-1,1]^2$. (a)(b) The stationary distribution achieved by the MH and Langevin version of MCMC-BO, respectively, and the congregated points obtained after convergence of the transition process from current GP information. (c) TS distribution simulated using Monte Carlo. (d) standard deviation of TS distribution over 10 trials of $10^6$ samples. (e) GP posterior with surfaces being $\mu$ and $\Sigma$ on which MCMC-BO transitions are performed.
...and 3 more figures

Theorems & Definitions (3)

lemma 1
lemma 2
theorem 1

Improving sample efficiency of high dimensional Bayesian optimization with MCMC

TL;DR

Abstract

Improving sample efficiency of high dimensional Bayesian optimization with MCMC

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)