Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimization
Colin Doumont, Victor Picheny, Viacheslav Borovitskiy, Henry Moss
TL;DR
This work addresses the challenge of kernel design for combinatorial Bayesian optimization by introducing a unifying framework based on heat kernels on graphs. It proves that prominent kernels like CASMOPOLITAN and COMBO are heat kernels on a Hamming graph and connects them to the RBF kernel after one-hot encoding, while also generalizing to graph-based and additive kernels and incorporating invariances. The authors provide comprehensive empirical validation across MCBO benchmarks, showing kernel equivalences and that a simple heat-kernel pipeline can achieve state-of-the-art performance with favorable compute. The framework offers practical benefits by enabling fast, robust kernel construction and broadening the toolkit for combinatorial BO, with limited downsides when little prior structure is known.
Abstract
Bayesian Optimization (BO) has the potential to solve various combinatorial tasks, ranging from materials science to neural architecture search. However, BO requires specialized kernels to effectively model combinatorial domains. Recent efforts have introduced several combinatorial kernels, but the relationships among them are not well understood. To bridge this gap, we develop a unifying framework based on heat kernels, which we derive in a systematic way and express as simple closed-form expressions. Using this framework, we prove that many successful combinatorial kernels are either related or equivalent to heat kernels, and validate this theoretical claim in our experiments. Moreover, our analysis confirms and extends the results presented in Bounce: certain algorithms' performance decreases substantially when the unknown optima of the function do not have a certain structure. In contrast, heat kernels are not sensitive to the location of the optima. Lastly, we show that a fast and simple pipeline, relying on heat kernels, is able to achieve state-of-the-art results, matching or even outperforming certain slow or complex algorithms.
