REINFORCE-ING Chemical Language Models for Drug Discovery
Morgan Thomas, Albert Bou, Jose Carlos Gómez-Tamayo, Gary Tresadern, Mazen Ahmad, Gianni De Fabritiis
TL;DR
The paper addresses improving sample efficiency and chemical validity when applying reinforcement learning to chemical language models for de novo drug design. It examines REINFORCE-based learning, introduces a simpler reward-shaping mechanism aligned with a pre-trained prior, and evaluates multiple extensions (baselines, experience replay, hill-climb, RND, KL regularization). On the MolOpt benchmark with a 10,000-molecule budget, ACEGEN configurations achieve state-of-the-art effectiveness and efficiency, and in a JNK3 Boltz2 case study they outperform baselines while yielding drug-like, synthesizable molecules and favorable allosteric selectivity. The work provides practical guidelines and open-source tools for applying RL to CLMs in drug discovery.
Abstract
Chemical language models, combined with reinforcement learning (RL), have shown significant promise to efficiently traverse large chemical spaces for drug discovery. However, the performance of various RL algorithms and their best practices for practical drug discovery are still unclear. Here, starting from the principles of the REINFORCE algorithm, we investigate the effect of different components from RL theory including experience replay, hill-climbing, baselines to reduce variance, and alternative reward shaping. We propose a new regularization method more aligned to REINFORCE than current standard practices, and demonstrate how RL hyperparameters can be fine-tuned for effectiveness and efficiency. Lastly, we apply our learnings to practical drug discovery by demonstrating enhanced learning efficiency on frontier binding affinity models by using Boltz2 as a reward model. We share our RL models used in the ACEGEN repository, and hope the experiments here act as a guide to researchers applying RL to chemical language models for drug discovery.
