Diagnosing and fixing common problems in Bayesian optimization for molecule design
Austin Tripp, José Miguel Hernández-Lobato
TL;DR
This work addresses why Bayesian optimization underperforms in molecule design and argues that hyperparameter choices—specifically prior width, smoothing, and search strategy—drive most of the gap. By diagnosing these issues and applying targeted fixes to a basic GP-BO with Morgan fingerprints on the PMO benchmark, the authors achieve state-of-the-art performance (AUC Top-10 = 16.303) compared to prior methods. They demonstrate that a carefully tuned, principled BO setup can outperform strong baselines, suggesting BO merits greater attention in ML for molecules. The study also highlights limitations and motivates future work on richer surrogates, multi-task/noisy settings, and broader acquisition-function experimentation.
Abstract
Bayesian optimization (BO) is a principled approach to molecular design tasks. In this paper we explain three pitfalls of BO which can cause poor empirical performance: an incorrect prior width, over-smoothing, and inadequate acquisition function maximization. We show that with these issues addressed, even a basic BO setup is able to achieve the highest overall performance on the PMO benchmark for molecule design (Gao et al 2022). These results suggest that BO may benefit from more attention in the machine learning for molecules community.
