A study of EHVI vs fixed scalarization for molecule design
Anabel Yong, Austin Tripp, Layla Hosseini-Gerami, Brooks Paige
TL;DR
The paper asks whether Pareto-based MOBO with Expected Hypervolume Improvement (EHVI) offers practical benefits over a fixed-weight scalarized EI in de novo molecular design. Using identical Gaussian Process surrogates and count-based molecular fingerprints across three GUACAMOL MPO tasks, it compares EHVI to EI over 200 Bayesian optimization iterations with 3 random seeds. EHVI consistently delivers higher hypervolume, better Pareto-front approximation, and greater chemical diversity, with meaningful effect sizes, demonstrating robustness in low-data regimes. The findings advocate Pareto-aware acquisition as a reliable default for multi-objective molecular optimization and point to future work on adaptive surrogates and representation-aware strategies.
Abstract
Multi-objective Bayesian optimization (MOBO) provides a principled framework for navigating trade-offs in molecular design. However, its empirical advantages over scalarized alternatives remain underexplored. We benchmark a simple Pareto-based MOBO strategy - Expected Hypervolume Improvement (EHVI) - against a simple fixed-weight scalarized baseline using Expected Improvement (EI), under a tightly controlled setup with identical Gaussian Process surrogates and molecular representations. Across three molecular optimization tasks, EHVI consistently outperforms scalarized EI in terms of Pareto front coverage, convergence speed, and chemical diversity. While scalarization encompasses flexible variants - including random or adaptive schemes - our results show that even strong deterministic instantiations can underperform in low-data regimes. These findings offer concrete evidence for the practical advantages of Pareto-aware acquisition in de novo molecular optimization, especially when evaluation budgets are limited and trade-offs are nontrivial.
