Table of Contents
Fetching ...

Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimization

Colin Doumont, Victor Picheny, Viacheslav Borovitskiy, Henry Moss

TL;DR

This work addresses the challenge of kernel design for combinatorial Bayesian optimization by introducing a unifying framework based on heat kernels on graphs. It proves that prominent kernels like CASMOPOLITAN and COMBO are heat kernels on a Hamming graph and connects them to the RBF kernel after one-hot encoding, while also generalizing to graph-based and additive kernels and incorporating invariances. The authors provide comprehensive empirical validation across MCBO benchmarks, showing kernel equivalences and that a simple heat-kernel pipeline can achieve state-of-the-art performance with favorable compute. The framework offers practical benefits by enabling fast, robust kernel construction and broadening the toolkit for combinatorial BO, with limited downsides when little prior structure is known.

Abstract

Bayesian Optimization (BO) has the potential to solve various combinatorial tasks, ranging from materials science to neural architecture search. However, BO requires specialized kernels to effectively model combinatorial domains. Recent efforts have introduced several combinatorial kernels, but the relationships among them are not well understood. To bridge this gap, we develop a unifying framework based on heat kernels, which we derive in a systematic way and express as simple closed-form expressions. Using this framework, we prove that many successful combinatorial kernels are either related or equivalent to heat kernels, and validate this theoretical claim in our experiments. Moreover, our analysis confirms and extends the results presented in Bounce: certain algorithms' performance decreases substantially when the unknown optima of the function do not have a certain structure. In contrast, heat kernels are not sensitive to the location of the optima. Lastly, we show that a fast and simple pipeline, relying on heat kernels, is able to achieve state-of-the-art results, matching or even outperforming certain slow or complex algorithms.

Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimization

TL;DR

This work addresses the challenge of kernel design for combinatorial Bayesian optimization by introducing a unifying framework based on heat kernels on graphs. It proves that prominent kernels like CASMOPOLITAN and COMBO are heat kernels on a Hamming graph and connects them to the RBF kernel after one-hot encoding, while also generalizing to graph-based and additive kernels and incorporating invariances. The authors provide comprehensive empirical validation across MCBO benchmarks, showing kernel equivalences and that a simple heat-kernel pipeline can achieve state-of-the-art performance with favorable compute. The framework offers practical benefits by enabling fast, robust kernel construction and broadening the toolkit for combinatorial BO, with limited downsides when little prior structure is known.

Abstract

Bayesian Optimization (BO) has the potential to solve various combinatorial tasks, ranging from materials science to neural architecture search. However, BO requires specialized kernels to effectively model combinatorial domains. Recent efforts have introduced several combinatorial kernels, but the relationships among them are not well understood. To bridge this gap, we develop a unifying framework based on heat kernels, which we derive in a systematic way and express as simple closed-form expressions. Using this framework, we prove that many successful combinatorial kernels are either related or equivalent to heat kernels, and validate this theoretical claim in our experiments. Moreover, our analysis confirms and extends the results presented in Bounce: certain algorithms' performance decreases substantially when the unknown optima of the function do not have a certain structure. In contrast, heat kernels are not sensitive to the location of the optima. Lastly, we show that a fast and simple pipeline, relying on heat kernels, is able to achieve state-of-the-art results, matching or even outperforming certain slow or complex algorithms.

Paper Structure

This paper contains 72 sections, 7 theorems, 67 equations, 10 figures, 1 table.

Key Result

theorem 1

The kernels from CASMOPOLITAN and COMBO are equivalent to heat kernels on a Hamming graph. That is, Equations eq:cas_kernel and eq:combo_kernel are proportional to (the ARD version of) Equation eq:kondor_simple.

Figures (10)

  • Figure 1: When keeping the pipeline fixed and varying only the kernel choice, our unifying framework becomes visible empirically: all heat (and Hamming) kernels achieve near-identical results.
  • Figure 2: A fast and simple pipeline, relying on heat kernels, achieves state-of-the-art results (after relocation of the optima), matching or even outperforming more complex or slow baselines.
  • Figure 3: Our simple heat-kernel pipeline achieves the third-lowest wall-clock time, arriving closely after CoCaBO and Bounce.
  • Figure 4: Extension of \ref{['fig:main_kernels']} (when keeping the pipeline fixed and varying only the kernel choice, our unifying framework becomes visible empirically).
  • Figure 5: Extension of \ref{['fig:main_methods']} (a fast and simple pipeline, relying on heat kernels, achieves state-of-the-art results).
  • ...and 5 more figures

Theorems & Definitions (25)

  • definition 1
  • definition 2
  • definition 3
  • definition 4
  • theorem 1: store=cascom
  • proof
  • definition 5
  • proposition 1: store=isohamming
  • proof
  • definition 6
  • ...and 15 more