Using GNN property predictors as molecule generators

Félix Therrien; Edward H. Sargent; Oleksandr Voznyy

Using GNN property predictors as molecule generators

Félix Therrien, Edward H. Sargent, Oleksandr Voznyy

TL;DR

The paper tackles inverse design by turning a pre-trained, differentiable GNN property predictor into a conditional molecule generator via gradient-based optimization on the molecular graph inputs. By enforcing explicit adjacency representations and valence constraints through a carefully crafted loss and a sloped rounding scheme, the authors generate valid molecules targeting specific energy gaps and logP values without additional training on structures. The DIDgen approach achieves performance comparable to or better than state-of-the-art genetic algorithms in energy-gap targets and yields the most diverse sets in logP generation, while also highlighting the importance of predictor generalizability and the potential for active-learning loops. Overall, this work demonstrates a lightweight, differentiable inversion paradigm that leverages GNN predictors for targeted, diverse molecular generation with practical implications for rapid materials and drug discovery.

Abstract

Graph neural networks (GNNs) have emerged as powerful tools to accurately predict materials and molecular properties in computational discovery pipelines. In this article, we exploit the invertible nature of these neural networks to directly generate molecular structures with desired electronic properties. Starting from a random graph or an existing molecule, we perform a gradient ascent while holding the GNN weights fixed in order to optimize its input, the molecular graph, towards the target property. Valence rules are enforced strictly through a judicious graph construction. The method relies entirely on the property predictor; no additional training is required on molecular structures. We demonstrate the application of this method by generating molecules with specific DFT-verified energy gaps and octanol-water partition coefficients (logP). Our approach hits target properties with rates comparable to or better than state-of-the-art generative models while consistently generating more diverse molecules.

Using GNN property predictors as molecule generators

TL;DR

Abstract

Paper Structure (17 sections, 10 equations, 7 figures, 3 tables)

This paper contains 17 sections, 10 equations, 7 figures, 3 tables.

Results
Rationale and workflow overview
Energy gap targeting
logP targeting
Discussion
Methods
Training of the property predictors
DFT validation of the energy gap
Loss function for the inversion procedure
Code Availability
Acknowledgements
Limiting the number of bonds
Feature vector construction
Single component graphs
Details about the energy gap predictor
...and 2 more sections

Figures (7)

Figure 1: a) Molecular representation for this work using an HCN molecule as an example. b) A visual representation of a typical training process for a neural network in comparison to an input optimization scheme.
Figure 2: Generated molecules HOMO-LUMO gap DFT and proxy predictions. Generated molecules are overlaid on the proxy model performance on the QM9 dataset (test + train).
Figure 3: a) Graphical representation of equation \ref{['eq:feature_vec']} b) Graphical representation of \ref{['eq:feature_vec']} after applying the sloped maximum function. It shows how the one hot encoding of the feature vector is constructed from the number of bonds ($x$) in the adjacency matrix.
Figure 4: HOMO-LUMO gap predictor performance
Figure 5: Atom class distribution
...and 2 more figures

Using GNN property predictors as molecule generators

TL;DR

Abstract

Using GNN property predictors as molecule generators

Authors

TL;DR

Abstract

Table of Contents

Figures (7)