Table of Contents
Fetching ...

Using GNN property predictors as molecule generators

Félix Therrien, Edward H. Sargent, Oleksandr Voznyy

TL;DR

The paper tackles inverse design by turning a pre-trained, differentiable GNN property predictor into a conditional molecule generator via gradient-based optimization on the molecular graph inputs. By enforcing explicit adjacency representations and valence constraints through a carefully crafted loss and a sloped rounding scheme, the authors generate valid molecules targeting specific energy gaps and logP values without additional training on structures. The DIDgen approach achieves performance comparable to or better than state-of-the-art genetic algorithms in energy-gap targets and yields the most diverse sets in logP generation, while also highlighting the importance of predictor generalizability and the potential for active-learning loops. Overall, this work demonstrates a lightweight, differentiable inversion paradigm that leverages GNN predictors for targeted, diverse molecular generation with practical implications for rapid materials and drug discovery.

Abstract

Graph neural networks (GNNs) have emerged as powerful tools to accurately predict materials and molecular properties in computational discovery pipelines. In this article, we exploit the invertible nature of these neural networks to directly generate molecular structures with desired electronic properties. Starting from a random graph or an existing molecule, we perform a gradient ascent while holding the GNN weights fixed in order to optimize its input, the molecular graph, towards the target property. Valence rules are enforced strictly through a judicious graph construction. The method relies entirely on the property predictor; no additional training is required on molecular structures. We demonstrate the application of this method by generating molecules with specific DFT-verified energy gaps and octanol-water partition coefficients (logP). Our approach hits target properties with rates comparable to or better than state-of-the-art generative models while consistently generating more diverse molecules.

Using GNN property predictors as molecule generators

TL;DR

The paper tackles inverse design by turning a pre-trained, differentiable GNN property predictor into a conditional molecule generator via gradient-based optimization on the molecular graph inputs. By enforcing explicit adjacency representations and valence constraints through a carefully crafted loss and a sloped rounding scheme, the authors generate valid molecules targeting specific energy gaps and logP values without additional training on structures. The DIDgen approach achieves performance comparable to or better than state-of-the-art genetic algorithms in energy-gap targets and yields the most diverse sets in logP generation, while also highlighting the importance of predictor generalizability and the potential for active-learning loops. Overall, this work demonstrates a lightweight, differentiable inversion paradigm that leverages GNN predictors for targeted, diverse molecular generation with practical implications for rapid materials and drug discovery.

Abstract

Graph neural networks (GNNs) have emerged as powerful tools to accurately predict materials and molecular properties in computational discovery pipelines. In this article, we exploit the invertible nature of these neural networks to directly generate molecular structures with desired electronic properties. Starting from a random graph or an existing molecule, we perform a gradient ascent while holding the GNN weights fixed in order to optimize its input, the molecular graph, towards the target property. Valence rules are enforced strictly through a judicious graph construction. The method relies entirely on the property predictor; no additional training is required on molecular structures. We demonstrate the application of this method by generating molecules with specific DFT-verified energy gaps and octanol-water partition coefficients (logP). Our approach hits target properties with rates comparable to or better than state-of-the-art generative models while consistently generating more diverse molecules.
Paper Structure (17 sections, 10 equations, 7 figures, 3 tables)

This paper contains 17 sections, 10 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: a) Molecular representation for this work using an HCN molecule as an example. b) A visual representation of a typical training process for a neural network in comparison to an input optimization scheme.
  • Figure 2: Generated molecules HOMO-LUMO gap DFT and proxy predictions. Generated molecules are overlaid on the proxy model performance on the QM9 dataset (test + train).
  • Figure 3: a) Graphical representation of equation \ref{['eq:feature_vec']} b) Graphical representation of \ref{['eq:feature_vec']} after applying the sloped maximum function. It shows how the one hot encoding of the feature vector is constructed from the number of bonds ($x$) in the adjacency matrix.
  • Figure 4: HOMO-LUMO gap predictor performance
  • Figure 5: Atom class distribution
  • ...and 2 more figures