Molecular De Novo Design through Deep Reinforcement Learning

Marcus Olivecrona; Thomas Blaschke; Ola Engkvist; Hongming Chen

Molecular De Novo Design through Deep Reinforcement Learning

Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, Hongming Chen

TL;DR

The paper tackles the challenge of navigating vast chemical space to find molecules with desirable properties by coupling a pre-trained SMILES-generating RNN (Prior) with policy-based reinforcement learning (Agent) that tunes generation via augmented episodic likelihood, defined as $\log P(A)_{\mathbb{U}} = \log P(A)_{Prior} + \sigma S(A)$. A DRD2 activity model (an SVM) guides generation toward predicted actives, while Celecoxib analogue and sulfur-avoidance tasks demonstrate flexibility across objectives. The Agent maintains alignment with the Prior while achieving high-quality outputs, recovering actives not present in the Prior or activity model, and achieving substantial enrichment for predicted actives (e.g., >95% predicted actives in DRD2-targeted runs). Compared to REINFORCE variants, the augmented-likelihood approach reduces pathological exploitation of rewards and preserves the underlying chemical space, offering a scalable approach to multi-parameter de novo design in drug discovery.

Abstract

This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model.

Molecular De Novo Design through Deep Reinforcement Learning

TL;DR

Abstract

Molecular De Novo Design through Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)