Molecular De Novo Design through Deep Reinforcement Learning
Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, Hongming Chen
TL;DR
The paper tackles the challenge of navigating vast chemical space to find molecules with desirable properties by coupling a pre-trained SMILES-generating RNN (Prior) with policy-based reinforcement learning (Agent) that tunes generation via augmented episodic likelihood, defined as $\log P(A)_{\mathbb{U}} = \log P(A)_{Prior} + \sigma S(A)$. A DRD2 activity model (an SVM) guides generation toward predicted actives, while Celecoxib analogue and sulfur-avoidance tasks demonstrate flexibility across objectives. The Agent maintains alignment with the Prior while achieving high-quality outputs, recovering actives not present in the Prior or activity model, and achieving substantial enrichment for predicted actives (e.g., >95% predicted actives in DRD2-targeted runs). Compared to REINFORCE variants, the augmented-likelihood approach reduces pathological exploitation of rewards and preserves the underlying chemical space, offering a scalable approach to multi-parameter de novo design in drug discovery.
Abstract
This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model.
