Table of Contents
Fetching ...

Guiding Evolutionary Molecular Design: Adding Reinforcement Learning for Mutation Selection

Gaelle Milon-Harnois, Chaimaa Touhami, Nicolas Gutowski, Benoit Da Mota, Thomas Cauchy

TL;DR

This work tackles the challenge of navigating an enormous chemical space for molecule generation by augmenting EvoMol with reinforcement learning that selects context-aware mutations. By encoding local molecular environments with Extended Connectivity Fingerprints ($ECFP$) and framing mutation selection as a sleeping bandit problem, EvoMol-RL learns which chemically plausible mutations to apply in given contexts. The approach yields substantially higher pre-filtering realism, particularly using $ECFP_2$ contexts, and demonstrates rapid policy learning with a controlled trade-off between exploration and exploitation. The findings suggest that coupling context-rich fingerprints with adversarial sleeping bandits can meaningfully improve the realism and efficiency of de novo molecular design, with potential extensions to other objectives and RL strategies.

Abstract

The efficient exploration of chemical space remains a central challenge, as many generative models still produce unstable or non-synthesizable compounds. To address these limitations, we present EvoMol-RL, a significant extension of the EvoMol evolutionary algorithm that integrates reinforcement learning to guide molecular mutations based on local structural context. By leveraging Extended Connectivity Fingerprints (ECFPs), EvoMol-RL learns context-aware mutation policies that prioritize chemically plausible transformations. This approach significantly improves the generation of valid and realistic molecules, reducing the frequency of structural artifacts and enhancing optimization performance. The results demonstrate that EvoMol-RL consistently outperforms its baseline in molecular pre-filtering realism. These results emphasize the effectiveness of combining reinforcement learning with molecular fingerprints to generate chemically relevant molecular structures.

Guiding Evolutionary Molecular Design: Adding Reinforcement Learning for Mutation Selection

TL;DR

This work tackles the challenge of navigating an enormous chemical space for molecule generation by augmenting EvoMol with reinforcement learning that selects context-aware mutations. By encoding local molecular environments with Extended Connectivity Fingerprints () and framing mutation selection as a sleeping bandit problem, EvoMol-RL learns which chemically plausible mutations to apply in given contexts. The approach yields substantially higher pre-filtering realism, particularly using contexts, and demonstrates rapid policy learning with a controlled trade-off between exploration and exploitation. The findings suggest that coupling context-rich fingerprints with adversarial sleeping bandits can meaningfully improve the realism and efficiency of de novo molecular design, with potential extensions to other objectives and RL strategies.

Abstract

The efficient exploration of chemical space remains a central challenge, as many generative models still produce unstable or non-synthesizable compounds. To address these limitations, we present EvoMol-RL, a significant extension of the EvoMol evolutionary algorithm that integrates reinforcement learning to guide molecular mutations based on local structural context. By leveraging Extended Connectivity Fingerprints (ECFPs), EvoMol-RL learns context-aware mutation policies that prioritize chemically plausible transformations. This approach significantly improves the generation of valid and realistic molecules, reducing the frequency of structural artifacts and enhancing optimization performance. The results demonstrate that EvoMol-RL consistently outperforms its baseline in molecular pre-filtering realism. These results emphasize the effectiveness of combining reinforcement learning with molecular fingerprints to generate chemically relevant molecular structures.

Paper Structure

This paper contains 25 sections, 9 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: ECFP generation process on Acetylsalicylic acid molecule and generation of the fixed-length fingerprint. Blue circles correspond to the central atoms.
  • Figure 2: Valid mutations starting from the acetylsalicylic acid molecule and corresponding $Idx_{ECFP_0}$ and $Idx_a$. (a) AddA valid mutations with (C, N, O, F) candidate atom set. (b) RmA valid mutations.
  • Figure 3: Percentage of realistic molecules generated across testing steps from the Power Law configuration with parameters $\alpha$=0.35 and $\varepsilon$=0.1. The solid line represents the mean values, with standard deviations indicated by the highlighted regions, computed using a sliding window of 10 steps. Results obtained with EvoMol and EvoMol-RL with consideration of ECFP$_0$ or ECFP$_2$ are respectively depicted in red, blue and green.