Table of Contents
Fetching ...

GraphXForm: Graph transformer for computer-aided molecular design

Jonathan Pirnay, Jan G. Rittig, Alexander B. Wolf, Martin Grohe, Jakob Burger, Alexander Mitsos, Dominik G. Grimm

TL;DR

GraphXForm introduces a graph‑transformer architecture that directly constructs molecular graphs via iterative atom and bond additions, ensuring chemical validity through action masking. It combines pretraining on large molecular corpora with a TASAR‑based self‑improvement fine‑tuning procedure, enabling stable optimization without reward shaping. Evaluations on GuacaMol drug‑design tasks and two solvent‑design benchmarks show superior performance relative to state‑of‑the‑art string‑based and graph‑based baselines, and the method flexibly enforces structural constraints and can start from predefined structures. The work highlights the viability and practical impact of graph‑based transformers for constrained, high‑quality molecular design, with efficient surrogate prediction for solvent design via a Gibbs–Helmholtz GNN.

Abstract

Generative deep learning has become pivotal in molecular design for drug discovery, materials science, and chemical engineering. A widely used paradigm is to pretrain neural networks on string representations of molecules and fine-tune them using reinforcement learning on specific objectives. However, string-based models face challenges in ensuring chemical validity and enforcing structural constraints like the presence of specific substructures. We propose to instead combine graph-based molecular representations, which can naturally ensure chemical validity, with transformer architectures, which are highly expressive and capable of modeling long-range dependencies between atoms. Our approach iteratively modifies a molecular graph by adding atoms and bonds, which ensures chemical validity and facilitates the incorporation of structural constraints. We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned using a new training algorithm that combines elements of the deep cross-entropy method and self-improvement learning. We evaluate GraphXForm on various drug design tasks, demonstrating superior objective scores compared to state-of-the-art molecular design approaches. Furthermore, we apply GraphXForm to two solvent design tasks for liquid-liquid extraction, again outperforming alternative methods while flexibly enforcing structural constraints or initiating design from existing molecular structures.

GraphXForm: Graph transformer for computer-aided molecular design

TL;DR

GraphXForm introduces a graph‑transformer architecture that directly constructs molecular graphs via iterative atom and bond additions, ensuring chemical validity through action masking. It combines pretraining on large molecular corpora with a TASAR‑based self‑improvement fine‑tuning procedure, enabling stable optimization without reward shaping. Evaluations on GuacaMol drug‑design tasks and two solvent‑design benchmarks show superior performance relative to state‑of‑the‑art string‑based and graph‑based baselines, and the method flexibly enforces structural constraints and can start from predefined structures. The work highlights the viability and practical impact of graph‑based transformers for constrained, high‑quality molecular design, with efficient surrogate prediction for solvent design via a Gibbs–Helmholtz GNN.

Abstract

Generative deep learning has become pivotal in molecular design for drug discovery, materials science, and chemical engineering. A widely used paradigm is to pretrain neural networks on string representations of molecules and fine-tune them using reinforcement learning on specific objectives. However, string-based models face challenges in ensuring chemical validity and enforcing structural constraints like the presence of specific substructures. We propose to instead combine graph-based molecular representations, which can naturally ensure chemical validity, with transformer architectures, which are highly expressive and capable of modeling long-range dependencies between atoms. Our approach iteratively modifies a molecular graph by adding atoms and bonds, which ensures chemical validity and facilitates the incorporation of structural constraints. We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned using a new training algorithm that combines elements of the deep cross-entropy method and self-improvement learning. We evaluate GraphXForm on various drug design tasks, demonstrating superior objective scores compared to state-of-the-art molecular design approaches. Furthermore, we apply GraphXForm to two solvent design tasks for liquid-liquid extraction, again outperforming alternative methods while flexibly enforcing structural constraints or initiating design from existing molecular structures.

Paper Structure

This paper contains 24 sections, 10 equations, 20 figures, 3 tables, 1 algorithm.

Figures (20)

  • Figure 1: Example for the sequential application of actions $x^{(0)}, x^{(1)}, x^{(2)}$ to a molecule $m^{(0)}$, using the alphabet $\Sigma = (\mathrm C, \mathrm N, \mathrm O)$. We show the index of each atom, which can be arbitrarily chosen at the beginning. Light blue indicates where in the graph an action is applied. The last action is $\textrm{DontChange}$, which does not change the molecular graph, but marks it as a complete design.
  • Figure 2: Flow of a molecule through the policy network of our method GraphXForm. a. We consider the alphabet $\Sigma = (\mathrm C, \mathrm N, \mathrm O)$. The molecule's underlying graph is augmented with a virtual node (indexed with 0) and embedded into the latent space $\mathbb R^d$. Learnable vectors are added to these embeddings to encode the current action level and decisions on previous levels. b. The latent sequence of atoms is passed through a stack of ReZero transformer layers, omitting positional encoding. In the multi-head attention, individual attention scores between atoms are biased with learnable scalars that depend on their bond order. These bias terms are learnable for each transformer layer and attention head individually. c. The sequence output by the transformer is projected through linear layers to generate logits for the distributions $P^{(0)}, P^{(1)}$ and $P^{(2)}$.
  • Figure 3: IBA task, unconstrained: Top three molecules (with their corresponding SMILES string and objective value) identified by each method across all runs.
  • Figure 4: TMB/DMBA task, unconstrained: Top three molecules (with their corresponding SMILES string and objective value) identified by each method across all runs.
  • Figure 5: Top three molecules designed by GraphXForm under structural constraints on specific bonding patterns and ring sizes.
  • ...and 15 more figures