Table of Contents
Fetching ...

MolecularRNN: Generating realistic molecular graphs with optimized properties

Mariya Popova, Mykhailo Shvets, Junier Oliva, Olexandr Isayev

TL;DR

MolecularRNN introduces a graph-based recurrent generator for molecular graphs that extends GraphRNN to handle atom and bond types. By employing valency-based rejection sampling, it achieves 100% validity during inference; a structural penalty provides informative signal during training. The method uses reinforcement learning with a critic to optimize properties such as penalized logP, QED, and melting temperature, yielding distribution shifts toward desired ranges and competitive performance against baselines on large-scale datasets. The approach enables de novo molecule design with controllable properties and scalable evaluation, advancing graph-based generative methods in drug discovery.

Abstract

Designing new molecules with a set of predefined properties is a core problem in modern drug discovery and development. There is a growing need for de-novo design methods that would address this problem. We present MolecularRNN, the graph recurrent generative model for molecular structures. Our model generates diverse realistic molecular graphs after likelihood pretraining on a big database of molecules. We perform an analysis of our pretrained models on large-scale generated datasets of 1 million samples. Further, the model is tuned with policy gradient algorithm, provided a critic that estimates the reward for the property of interest. We show a significant distribution shift to the desired range for lipophilicity, drug-likeness, and melting point outperforming state-of-the-art works. With the use of rejection sampling based on valency constraints, our model yields 100% validity. Moreover, we show that invalid molecules provide a rich signal to the model through the use of structure penalty in our reinforcement learning pipeline.

MolecularRNN: Generating realistic molecular graphs with optimized properties

TL;DR

MolecularRNN introduces a graph-based recurrent generator for molecular graphs that extends GraphRNN to handle atom and bond types. By employing valency-based rejection sampling, it achieves 100% validity during inference; a structural penalty provides informative signal during training. The method uses reinforcement learning with a critic to optimize properties such as penalized logP, QED, and melting temperature, yielding distribution shifts toward desired ranges and competitive performance against baselines on large-scale datasets. The approach enables de novo molecule design with controllable properties and scalable evaluation, advancing graph-based generative methods in drug discovery.

Abstract

Designing new molecules with a set of predefined properties is a core problem in modern drug discovery and development. There is a growing need for de-novo design methods that would address this problem. We present MolecularRNN, the graph recurrent generative model for molecular structures. Our model generates diverse realistic molecular graphs after likelihood pretraining on a big database of molecules. We perform an analysis of our pretrained models on large-scale generated datasets of 1 million samples. Further, the model is tuned with policy gradient algorithm, provided a critic that estimates the reward for the property of interest. We show a significant distribution shift to the desired range for lipophilicity, drug-likeness, and melting point outperforming state-of-the-art works. With the use of rejection sampling based on valency constraints, our model yields 100% validity. Moreover, we show that invalid molecules provide a rich signal to the model through the use of structure penalty in our reinforcement learning pipeline.

Paper Structure

This paper contains 14 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: MolecularRNN model. The model consists of NodeRNN that unrolls across atoms, predicting the type of the next atom in the molecular graph, and EdgeRNN that for every atom is initialized with NodeRNN hidden state, and unrolls across preceding atoms to predict bond types.
  • Figure 2: Top 3 molecules for MolecularRNN optimized with policy gradient
  • Figure 3: Distribution of maximized QED for MolecularRNN and GCPN.
  • Figure 4: Melting temperature maximization