Table of Contents
Fetching ...

A study of EHVI vs fixed scalarization for molecule design

Anabel Yong, Austin Tripp, Layla Hosseini-Gerami, Brooks Paige

TL;DR

The paper asks whether Pareto-based MOBO with Expected Hypervolume Improvement (EHVI) offers practical benefits over a fixed-weight scalarized EI in de novo molecular design. Using identical Gaussian Process surrogates and count-based molecular fingerprints across three GUACAMOL MPO tasks, it compares EHVI to EI over 200 Bayesian optimization iterations with 3 random seeds. EHVI consistently delivers higher hypervolume, better Pareto-front approximation, and greater chemical diversity, with meaningful effect sizes, demonstrating robustness in low-data regimes. The findings advocate Pareto-aware acquisition as a reliable default for multi-objective molecular optimization and point to future work on adaptive surrogates and representation-aware strategies.

Abstract

Multi-objective Bayesian optimization (MOBO) provides a principled framework for navigating trade-offs in molecular design. However, its empirical advantages over scalarized alternatives remain underexplored. We benchmark a simple Pareto-based MOBO strategy - Expected Hypervolume Improvement (EHVI) - against a simple fixed-weight scalarized baseline using Expected Improvement (EI), under a tightly controlled setup with identical Gaussian Process surrogates and molecular representations. Across three molecular optimization tasks, EHVI consistently outperforms scalarized EI in terms of Pareto front coverage, convergence speed, and chemical diversity. While scalarization encompasses flexible variants - including random or adaptive schemes - our results show that even strong deterministic instantiations can underperform in low-data regimes. These findings offer concrete evidence for the practical advantages of Pareto-aware acquisition in de novo molecular optimization, especially when evaluation budgets are limited and trade-offs are nontrivial.

A study of EHVI vs fixed scalarization for molecule design

TL;DR

The paper asks whether Pareto-based MOBO with Expected Hypervolume Improvement (EHVI) offers practical benefits over a fixed-weight scalarized EI in de novo molecular design. Using identical Gaussian Process surrogates and count-based molecular fingerprints across three GUACAMOL MPO tasks, it compares EHVI to EI over 200 Bayesian optimization iterations with 3 random seeds. EHVI consistently delivers higher hypervolume, better Pareto-front approximation, and greater chemical diversity, with meaningful effect sizes, demonstrating robustness in low-data regimes. The findings advocate Pareto-aware acquisition as a reliable default for multi-objective molecular optimization and point to future work on adaptive surrogates and representation-aware strategies.

Abstract

Multi-objective Bayesian optimization (MOBO) provides a principled framework for navigating trade-offs in molecular design. However, its empirical advantages over scalarized alternatives remain underexplored. We benchmark a simple Pareto-based MOBO strategy - Expected Hypervolume Improvement (EHVI) - against a simple fixed-weight scalarized baseline using Expected Improvement (EI), under a tightly controlled setup with identical Gaussian Process surrogates and molecular representations. Across three molecular optimization tasks, EHVI consistently outperforms scalarized EI in terms of Pareto front coverage, convergence speed, and chemical diversity. While scalarization encompasses flexible variants - including random or adaptive schemes - our results show that even strong deterministic instantiations can underperform in low-data regimes. These findings offer concrete evidence for the practical advantages of Pareto-aware acquisition in de novo molecular optimization, especially when evaluation budgets are limited and trade-offs are nontrivial.

Paper Structure

This paper contains 18 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Hypervolume indicator (HVI) over 200 Bayesian optimization iterations for each MPO task. EHVI consistently achieves higher hypervolume than scalarized EI and random sampling, with faster convergence and greater final front coverage. Shaded areas represent standard deviation over 3 random seeds.
  • Figure 2: $R^2$ indicator across 200 Bayesian optimization iterations for each MPO task. Lower values reflect better approximation of the true Pareto front under varying utility directions. EHVI consistently achieves lower $R^2$ values than scalarized EI and random sampling, indicating superior convergence toward the reference front. Shaded regions show standard deviation over 3 random seeds.
  • Figure 3: Structural diversity assessed using the #Circles metric across increasing Tanimoto distance thresholds. Higher values indicate broader exploration of structurally distinct regions of the chemical space. EHVI consistently maintains or exceeds the diversity of scalarized EI, particularly at stricter thresholds. Error bars denote standard deviation across 3 random seeds.