Table of Contents
Fetching ...

Utilizing Reinforcement Learning for de novo Drug Design

Hampus Gummesson Svensson, Christian Tyrchan, Ola Engkvist, Morteza Haghir Chehreghani

TL;DR

This work systematically compares on-policy and off-policy reinforcement learning methods for de novo SMILES-based drug design using an RNN, exploring diverse replay buffers and a scaffold-aware diversity filter. It demonstrates that incorporating both high- and low-scoring molecules and using replay memories can enhance structural diversity and the number of DRD2-predictive actives, with trade-offs in stability and exploration duration. Regularized MLE with full-batch learning and diversity filtering often delivers strong performance, while off-policy methods like SAC and ACER can achieve high diversity and actives under appropriate replay schemes. The study provides actionable insights and an open-source framework for researchers to tailor RL strategies to drug design objectives.

Abstract

Deep learning-based approaches for generating novel drug molecules with specific properties have gained a lot of interest in the last few years. Recent studies have demonstrated promising performance for string-based generation of novel molecules utilizing reinforcement learning. In this paper, we develop a unified framework for using reinforcement learning for de novo drug design, wherein we systematically study various on- and off-policy reinforcement learning algorithms and replay buffers to learn an RNN-based policy to generate novel molecules predicted to be active against the dopamine receptor DRD2. Our findings suggest that it is advantageous to use at least both top-scoring and low-scoring molecules for updating the policy when structural diversity is essential. Using all generated molecules at an iteration seems to enhance performance stability for on-policy algorithms. In addition, when replaying high, intermediate, and low-scoring molecules, off-policy algorithms display the potential of improving the structural diversity and number of active molecules generated, but possibly at the cost of a longer exploration phase. Our work provides an open-source framework enabling researchers to investigate various reinforcement learning methods for de novo drug design.

Utilizing Reinforcement Learning for de novo Drug Design

TL;DR

This work systematically compares on-policy and off-policy reinforcement learning methods for de novo SMILES-based drug design using an RNN, exploring diverse replay buffers and a scaffold-aware diversity filter. It demonstrates that incorporating both high- and low-scoring molecules and using replay memories can enhance structural diversity and the number of DRD2-predictive actives, with trade-offs in stability and exploration duration. Regularized MLE with full-batch learning and diversity filtering often delivers strong performance, while off-policy methods like SAC and ACER can achieve high diversity and actives under appropriate replay schemes. The study provides actionable insights and an open-source framework for researchers to tailor RL strategies to drug design objectives.

Abstract

Deep learning-based approaches for generating novel drug molecules with specific properties have gained a lot of interest in the last few years. Recent studies have demonstrated promising performance for string-based generation of novel molecules utilizing reinforcement learning. In this paper, we develop a unified framework for using reinforcement learning for de novo drug design, wherein we systematically study various on- and off-policy reinforcement learning algorithms and replay buffers to learn an RNN-based policy to generate novel molecules predicted to be active against the dopamine receptor DRD2. Our findings suggest that it is advantageous to use at least both top-scoring and low-scoring molecules for updating the policy when structural diversity is essential. Using all generated molecules at an iteration seems to enhance performance stability for on-policy algorithms. In addition, when replaying high, intermediate, and low-scoring molecules, off-policy algorithms display the potential of improving the structural diversity and number of active molecules generated, but possibly at the cost of a longer exploration phase. Our work provides an open-source framework enabling researchers to investigate various reinforcement learning methods for de novo drug design.
Paper Structure (51 sections, 11 equations, 22 figures, 5 tables, 1 algorithm)

This paper contains 51 sections, 11 equations, 22 figures, 5 tables, 1 algorithm.

Figures (22)

  • Figure 1: Schematic illustration of the de novo drug design process using reinforcement learning (RL).
  • Figure 2: Taxonomy of the reinforcement learning (RL) algorithms explored in this work.
  • Figure 3: The structural formula and SMILES strings for an arbitrary molecule, and its corresponding molecular and topological scaffold.
  • Figure 4: Illustration of the different combinations of replay buffer, policy optimization algorithm and diversity filter that are investigated in this paper.
  • Figure 5: Box plots of the number of unique active molecules and active scaffolds for the on-policy algorithms A2C, Regularized MLE, and PPO (higher is better) when utilizing identical molecular scaffold filter. It shows mean and standard deviation over 11 runs for each policy and replay buffer. 128 SMILES strings are sampled in each episode, with a budget of 2000 episodes in total.
  • ...and 17 more figures