Table of Contents
Fetching ...

Molecular De Novo Design through Deep Reinforcement Learning

Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, Hongming Chen

TL;DR

The paper tackles the challenge of navigating vast chemical space to find molecules with desirable properties by coupling a pre-trained SMILES-generating RNN (Prior) with policy-based reinforcement learning (Agent) that tunes generation via augmented episodic likelihood, defined as $\log P(A)_{\mathbb{U}} = \log P(A)_{Prior} + \sigma S(A)$. A DRD2 activity model (an SVM) guides generation toward predicted actives, while Celecoxib analogue and sulfur-avoidance tasks demonstrate flexibility across objectives. The Agent maintains alignment with the Prior while achieving high-quality outputs, recovering actives not present in the Prior or activity model, and achieving substantial enrichment for predicted actives (e.g., >95% predicted actives in DRD2-targeted runs). Compared to REINFORCE variants, the augmented-likelihood approach reduces pathological exploitation of rewards and preserves the underlying chemical space, offering a scalable approach to multi-parameter de novo design in drug discovery.

Abstract

This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model.

Molecular De Novo Design through Deep Reinforcement Learning

TL;DR

The paper tackles the challenge of navigating vast chemical space to find molecules with desirable properties by coupling a pre-trained SMILES-generating RNN (Prior) with policy-based reinforcement learning (Agent) that tunes generation via augmented episodic likelihood, defined as . A DRD2 activity model (an SVM) guides generation toward predicted actives, while Celecoxib analogue and sulfur-avoidance tasks demonstrate flexibility across objectives. The Agent maintains alignment with the Prior while achieving high-quality outputs, recovering actives not present in the Prior or activity model, and achieving substantial enrichment for predicted actives (e.g., >95% predicted actives in DRD2-targeted runs). Compared to REINFORCE variants, the augmented-likelihood approach reduces pathological exploitation of rewards and preserves the underlying chemical space, offering a scalable approach to multi-parameter de novo design in drug discovery.

Abstract

This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model.

Paper Structure

This paper contains 16 sections, 9 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Learning the data. Depiction of maximum likelihood training of an RNN. $x^t$ are the target sequence tokens we are trying to learn by maximizing $P(x^t)$ for each step.
  • Figure 2: Generating sequences. Sequence generation by a trained RNN. Every timestep $t$ we sample the next token of the sequence $x^{t}$ from the probability distribution given by the RNN, which is then fed in as the next input.
  • Figure 3: Three representations of 4-(chloromethyl)-1H-imidazole. Depiction of a one-hot representation derived from the SMILES of a molecule. Here a reduced vocabulary is shown, while in practice a much larger vocabulary that covers all tokens present in the training data is used.
  • Figure 4: The Agent. Illustration of how the model is constructed. Starting from a Prior network trained on ChEMBL, the Agent is trained using the augmented likelihood of the SMILES generated.
  • Figure 5: How the model thinks while generating the molecule on the right. Conditional probability over the next token as a function of previously chosen ones according to the model. On the y-axis is shown the probability distribution for the character to be choosen at the current step, and on the x-axis is shown the character that in this instance was sampled. E = EOS.
  • ...and 8 more figures