Table of Contents
Fetching ...

Graph Diffusion Transformers for Multi-Conditional Molecular Generation

Gang Liu, Jiaxin Xu, Tengfei Luo, Meng Jiang

TL;DR

This work extensively validate Graph DiT for multi-conditional polymer and small molecule generation and demonstrates the superiority of Graph DiT across nine metrics from distribution learning to condition control for molecular properties.

Abstract

Inverse molecular design with diffusion models holds great potential for advancements in material and drug discovery. Despite success in unconditional molecular generation, integrating multiple properties such as synthetic score and gas permeability as condition constraints into diffusion models remains unexplored. We present the Graph Diffusion Transformer (Graph DiT) for multi-conditional molecular generation. Graph DiT integrates an encoder to learn numerical and categorical property representations with the Transformer-based denoiser. Unlike previous graph diffusion models that add noise separately on the atoms and bonds in the forward diffusion process, Graph DiT is trained with a novel graph-dependent noise model for accurate estimation of graph-related noise in molecules. We extensively validate Graph DiT for multi-conditional polymer and small molecule generation. Results demonstrate the superiority of Graph DiT across nine metrics from distribution learning to condition control for molecular properties. A polymer inverse design task for gas separation with feedback from domain experts further demonstrates its practical utility.

Graph Diffusion Transformers for Multi-Conditional Molecular Generation

TL;DR

This work extensively validate Graph DiT for multi-conditional polymer and small molecule generation and demonstrates the superiority of Graph DiT across nine metrics from distribution learning to condition control for molecular properties.

Abstract

Inverse molecular design with diffusion models holds great potential for advancements in material and drug discovery. Despite success in unconditional molecular generation, integrating multiple properties such as synthetic score and gas permeability as condition constraints into diffusion models remains unexplored. We present the Graph Diffusion Transformer (Graph DiT) for multi-conditional molecular generation. Graph DiT integrates an encoder to learn numerical and categorical property representations with the Transformer-based denoiser. Unlike previous graph diffusion models that add noise separately on the atoms and bonds in the forward diffusion process, Graph DiT is trained with a novel graph-dependent noise model for accurate estimation of graph-related noise in molecules. We extensively validate Graph DiT for multi-conditional polymer and small molecule generation. Results demonstrate the superiority of Graph DiT across nine metrics from distribution learning to condition control for molecular properties. A polymer inverse design task for gas separation with feedback from domain experts further demonstrates its practical utility.
Paper Structure (50 sections, 10 equations, 11 figures, 8 tables)

This paper contains 50 sections, 10 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Multi-conditional diffusion guidance in (b) generates polymers of higher property accuracy than existing work in (a). Explanations are in \ref{['sec:introduction']} and details are in \ref{['add:subsec:motivation-setup']}.
  • Figure 2: Denoising framework and architectures for Graph DiT. Details are in \ref{['subsec:denoise-model']}.
  • Figure 3: Polymer Inverse Design for O$_2$/N$_2$ Gas Separation: Feedback from four domain experts includes an average Utility Score (UtS) for relative usefulness and an Agreement Score (AS) for generated polymers, both ranging [0, 1]. Polymers are generated conditional on {SAS=3.8, SCS=4.3, O$_2$Perm=34.0, N$_2$Perm=5.2}. The top-3 polymers, highlighted, are all generated by Graph DiT.
  • Figure 4: Relative Performance of Different Model Designs: A higher bar indicates better performance. We use the performance of clustering-based encoding or $\operatorname{AdaLN}$ as the Reference Value and the current option as the Current Value. Relative performance is calculated as $\frac{\text{Current Value}}{\text{Reference Value}}$ for Similarity and Diversity metrics, and as $\frac{\text{Reference Value}}{\text{Current Value}}$ for other metrics.
  • Figure 5: Histogram of Generated Distribution for Atom and Bond Types in Different Models. Results are calculated based on \ref{['tab:main1']} for the polymer gas permeability tasks. We observe that the atom and bond type distributions from our Graph DiT's generated molecules are closer to those of the training data than other diffusion models. It indicates that Graph DiT has better capacity for learning molecular distributions.
  • ...and 6 more figures