Table of Contents
Fetching ...

Spectral Mixture Kernels for Bayesian Optimization

Yi Zhang, Cheng Hua

TL;DR

This work addresses the surrogate modeling challenge in Bayesian Optimization by introducing spectral mixture kernels in the Fourier domain, built from mixtures of Gaussian and Cauchy spectral densities. The approach yields a flexible and efficient GP kernel with provable information gain and regret bounds, capable of approximating a wide class of stationary kernels. Empirical results across synthetic and real-world tasks show consistent improvements over conventional kernels and BO baselines, including high-dimensional problems. By leveraging Bochner’s theorem and spectral representations, the paper advances kernel design for BO, balancing expressiveness with computational tractability.

Abstract

Bayesian Optimization (BO) is a widely used approach for solving expensive black-box optimization tasks. However, selecting an appropriate probabilistic surrogate model remains an important yet challenging problem. In this work, we introduce a novel Gaussian Process (GP)-based BO method that incorporates spectral mixture kernels, derived from spectral densities formed by scale-location mixtures of Cauchy and Gaussian distributions. This method achieves a significant improvement in both efficiency and optimization performance, matching the computational speed of simpler kernels while delivering results that outperform more complex models and automatic BO methods. We provide bounds on the information gain and cumulative regret associated with obtaining the optimum. Extensive numerical experiments demonstrate that our method consistently outperforms existing baselines across a diverse range of synthetic and real-world problems, including both low- and high-dimensional settings.

Spectral Mixture Kernels for Bayesian Optimization

TL;DR

This work addresses the surrogate modeling challenge in Bayesian Optimization by introducing spectral mixture kernels in the Fourier domain, built from mixtures of Gaussian and Cauchy spectral densities. The approach yields a flexible and efficient GP kernel with provable information gain and regret bounds, capable of approximating a wide class of stationary kernels. Empirical results across synthetic and real-world tasks show consistent improvements over conventional kernels and BO baselines, including high-dimensional problems. By leveraging Bochner’s theorem and spectral representations, the paper advances kernel design for BO, balancing expressiveness with computational tractability.

Abstract

Bayesian Optimization (BO) is a widely used approach for solving expensive black-box optimization tasks. However, selecting an appropriate probabilistic surrogate model remains an important yet challenging problem. In this work, we introduce a novel Gaussian Process (GP)-based BO method that incorporates spectral mixture kernels, derived from spectral densities formed by scale-location mixtures of Cauchy and Gaussian distributions. This method achieves a significant improvement in both efficiency and optimization performance, matching the computational speed of simpler kernels while delivering results that outperform more complex models and automatic BO methods. We provide bounds on the information gain and cumulative regret associated with obtaining the optimum. Extensive numerical experiments demonstrate that our method consistently outperforms existing baselines across a diverse range of synthetic and real-world problems, including both low- and high-dimensional settings.

Paper Structure

This paper contains 40 sections, 9 theorems, 58 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

Theorem 3.1

A complex-valued function $k$ on $\mathbb{R}^P$ is the kernel of a weakly stationary, mean square continuous complex-valued random process on $\mathbb{R}^P$ if and only if it can be represented as where $\psi$ is a positive finite Borel measure on $\mathbb{R}^P$.

Figures (7)

  • Figure 1: Comparison of predictive distributions for the objective $f(x)$ using different kernels, before and after conditioning on the sampled point. The upper subplots show the GP surrogate before and after adding a new sample point, while the lower subplots display the corresponding acquisition function values using UCB and indicate the next point to sample.
  • Figure 2: Learned correlation function in kernel approximation. The horizontal axis denotes the Euclidean distance between two points, while the vertical axis represents the corresponding covariance distance. The darker solid line denotes the kernel that generates sampling points. Spectral mixture kernels more closely approximate the true kernel.
  • Figure 3: Performance of different test functions and algorithms across 10 repetitions using UCB acquisition function.
  • Figure 4: Results for the average mean regret over iterations using UCB as the acquisition function.
  • Figure 5: Optimization performance of different test functions and algorithms across 10 repetitions using EI acquisition function.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Theorem 3.1: Bochner's Theorem Bochner_1960
  • Theorem 3.2: Mercer's Theorem Knig_1986
  • Theorem 4.1
  • Theorem 5.1
  • Proposition 5.2
  • Example 5.3
  • Theorem B.1: Srinivas et al. Srinivas_2012
  • Theorem B.3: Widom Widom_1963
  • Lemma B.4
  • proof
  • ...and 2 more