Efficient Evolutionary Search Over Chemical Space with Large Language Models

Haorui Wang; Marta Skreta; Cher-Tian Ser; Wenhao Gao; Lingkai Kong; Felix Strieth-Kalthoff; Chenru Duan; Yuchen Zhuang; Yue Yu; Yanqiao Zhu; Yuanqi Du; Alán Aspuru-Guzik; Kirill Neklyudov; Chao Zhang

Efficient Evolutionary Search Over Chemical Space with Large Language Models

Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang

TL;DR

MolLEO addresses the inefficiency of traditional evolutionary molecular optimization by using chemistry-aware LLMs as genetic operators in an EA framework. By leveraging models such as GPT-4, BioT5, and MoleculeSTM to guide crossover and mutation within Graph-GA, MolLEO achieves higher final objective values and faster convergence across single- and multi-objective tasks, including structure-based docking. Empirical results on PMO and TDC benchmarks show that LLM-guided edits reduce the number of expensive evaluations while outperforming baselines, with open-source and commercial models offering complementary advantages. The work demonstrates the viability of LLM-based genetic operators for accelerated, data-efficient molecular discovery and provides reproducible code for further adoption and extension.

Abstract

Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations. In this work, we ameliorate this shortcoming by incorporating chemistry-aware Large Language Models (LLMs) into EAs. Namely, we redesign crossover and mutation operations in EAs using LLMs trained on large corpora of chemical information. We perform extensive empirical studies on both commercial and open-source models on multiple tasks involving property optimization, molecular rediscovery, and structure-based drug design, demonstrating that the joint usage of LLMs with EAs yields superior performance over all baseline models across single- and multi-objective settings. We demonstrate that our algorithm improves both the quality of the final solution and convergence speed, thereby reducing the number of required objective evaluations. Our code is available at http://github.com/zoom-wang112358/MOLLEO

Efficient Evolutionary Search Over Chemical Space with Large Language Models

TL;DR

Abstract

Paper Structure (53 sections, 5 equations, 10 figures, 18 tables, 1 algorithm)

This paper contains 53 sections, 5 equations, 10 figures, 18 tables, 1 algorithm.

Introduction
Related Work
Molecular Optimization
Language Models in Chemistry
The MolLEO Framework
Problem Statement
Black-box optimization.
Multi-objective black-box optimization.
Evolutionary Algorithms
Graph-GA
MolLEO (GPT-4)
MolLEO (BioT5)
MolLEO (MolSTM)
Experiments
Experimental Setup
...and 38 more sections

Figures (10)

Figure 1: Overview of MolLEO. Given an initial pool of molecules, mates are selected using default Graph-GA jensen2019graph heuristics and converted to SMILES or SELFIES strings. LLMs then function as mutation or crossover operators, editing the molecule string representations based on text prompts that describe the target objective(s). The offspring molecules are then evaluated using an oracle, and the best-scoring ones are passed to the next generation. This process is repeated until the maximum number of allowed molecule evaluations is performed.
Figure 2: Population fitness over increasing number of iterations for JNK3 inhibition. In the lightest blue, we plot the fitness distribution of the initial molecule pool. We then pass the molecules through a single round of LLM edits (pink curve), or a single round of random crossover/mutation operations (yellow curve). We then show the fitnesses of the top-10 molecules after 1000-4000 oracle calls.
Figure 3: Average docking score of top-10 molecules when docked against DRD3, EGFR, or Adenosine A2A receptor proteins. Lower docking scores are better. For each model, we show the convergence point (the moment of stabilization of the population scores) with a star, if the model converges before 1000 oracle calls have been made. Here, the model is considered to have converged if the mean score of the top 100 molecules does not increase by at least 1e-3 within 5 epochs.
Figure A1: Average of top-10 molecules generated by MolLEO and Graph-GA models for three tasks over an increasing number of oracle calls. For each model, we show the convergence point with a star. The model is considered to have converged if the mean score of the top 100 molecules does not increase by at least 1e-3 within five epochs.
Figure A2: Mean fitness and percent valid molecules with a varying number of gradient descent epochs (plotted on log-scale) and learning rates in MoleculeSTM on two tasks: (a) molecular similarity to Penicillin (based on Tanimoto distance) and (b) molecule hydrophobicity (logP).
...and 5 more figures

Efficient Evolutionary Search Over Chemical Space with Large Language Models

TL;DR

Abstract

Efficient Evolutionary Search Over Chemical Space with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)