Surrogate-Based Black-Box Optimization Method for Costly Molecular Properties
Jules Leguy, Thomas Cauchy, Beatrice Duval, Benoit Da Mota
TL;DR
This work tackles the challenge of optimizing costly molecular properties by integrating surrogate-based black-box optimization with Gaussian Process Regression and an evolutionary search guided by the Expected Improvement. The framework efficiently explores molecular graphs by learning a surrogate from a minimal initial dataset and iteratively proposing candidates via EI optimization, re-training as expensive evaluations are performed. Two descriptors are studied: MBTR, a geometry-based representation, and a fast graph-based shingles descriptor, with GPR models evaluated for data efficiency and runtime. Empirical results on HOMO energy optimization demonstrate that the surrogate-based approach can outperform a baseline evolutionary algorithm by up to several-fold in both objective evaluations and runtime, suggesting practical scalability to larger chemical spaces and broader property targets.
Abstract
AI-assisted molecular optimization is a very active research field as it is expected to provide the next-generation drugs and molecular materials. An important difficulty is that the properties to be optimized rely on costly evaluations. Machine learning methods are investigated with success to predict these properties, but show generalization issues on less known areas of the chemical space. We propose here a surrogate-based black box optimization method, to tackle jointly the optimization and machine learning problems. It consists in optimizing the expected improvement of the surrogate of a molecular property using an evolutionary algorithm. The surrogate is defined as a Gaussian Process Regression (GPR) model, learned on a relevant area of the search space with respect to the property to be optimized. We show that our approach can successfully optimize a costly property of interest much faster than a purely metaheuristic approach.
