Utilizing Reinforcement Learning for de novo Drug Design
Hampus Gummesson Svensson, Christian Tyrchan, Ola Engkvist, Morteza Haghir Chehreghani
TL;DR
This work systematically compares on-policy and off-policy reinforcement learning methods for de novo SMILES-based drug design using an RNN, exploring diverse replay buffers and a scaffold-aware diversity filter. It demonstrates that incorporating both high- and low-scoring molecules and using replay memories can enhance structural diversity and the number of DRD2-predictive actives, with trade-offs in stability and exploration duration. Regularized MLE with full-batch learning and diversity filtering often delivers strong performance, while off-policy methods like SAC and ACER can achieve high diversity and actives under appropriate replay schemes. The study provides actionable insights and an open-source framework for researchers to tailor RL strategies to drug design objectives.
Abstract
Deep learning-based approaches for generating novel drug molecules with specific properties have gained a lot of interest in the last few years. Recent studies have demonstrated promising performance for string-based generation of novel molecules utilizing reinforcement learning. In this paper, we develop a unified framework for using reinforcement learning for de novo drug design, wherein we systematically study various on- and off-policy reinforcement learning algorithms and replay buffers to learn an RNN-based policy to generate novel molecules predicted to be active against the dopamine receptor DRD2. Our findings suggest that it is advantageous to use at least both top-scoring and low-scoring molecules for updating the policy when structural diversity is essential. Using all generated molecules at an iteration seems to enhance performance stability for on-policy algorithms. In addition, when replaying high, intermediate, and low-scoring molecules, off-policy algorithms display the potential of improving the structural diversity and number of active molecules generated, but possibly at the cost of a longer exploration phase. Our work provides an open-source framework enabling researchers to investigate various reinforcement learning methods for de novo drug design.
