Guiding Evolutionary Molecular Design: Adding Reinforcement Learning for Mutation Selection
Gaelle Milon-Harnois, Chaimaa Touhami, Nicolas Gutowski, Benoit Da Mota, Thomas Cauchy
TL;DR
This work tackles the challenge of navigating an enormous chemical space for molecule generation by augmenting EvoMol with reinforcement learning that selects context-aware mutations. By encoding local molecular environments with Extended Connectivity Fingerprints ($ECFP$) and framing mutation selection as a sleeping bandit problem, EvoMol-RL learns which chemically plausible mutations to apply in given contexts. The approach yields substantially higher pre-filtering realism, particularly using $ECFP_2$ contexts, and demonstrates rapid policy learning with a controlled trade-off between exploration and exploitation. The findings suggest that coupling context-rich fingerprints with adversarial sleeping bandits can meaningfully improve the realism and efficiency of de novo molecular design, with potential extensions to other objectives and RL strategies.
Abstract
The efficient exploration of chemical space remains a central challenge, as many generative models still produce unstable or non-synthesizable compounds. To address these limitations, we present EvoMol-RL, a significant extension of the EvoMol evolutionary algorithm that integrates reinforcement learning to guide molecular mutations based on local structural context. By leveraging Extended Connectivity Fingerprints (ECFPs), EvoMol-RL learns context-aware mutation policies that prioritize chemically plausible transformations. This approach significantly improves the generation of valid and realistic molecules, reducing the frequency of structural artifacts and enhancing optimization performance. The results demonstrate that EvoMol-RL consistently outperforms its baseline in molecular pre-filtering realism. These results emphasize the effectiveness of combining reinforcement learning with molecular fingerprints to generate chemically relevant molecular structures.
